From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2ECE3A784D for ; Tue, 24 Feb 2026 16:35:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950921; cv=none; b=mNmDL2wn9aCdwbtl9kTcHrNwGV1P1gwtRI9lA0SGxrx/qJTdfaV5QgNzsCDNlrZJPbDePUW32vidC+QvMt3i692g2C8PCd8b91IDxy1fKSh9g6BDH6WIsz/ckdv/y2ncqnnTmzvNetNvXTLECb+o/AuGp9iko1doo/3mnerYRb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950921; c=relaxed/simple; bh=ivti1BQ26HIJUTFrd9/bBB/3cN9Zh/5oIInGVfOpYF8=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ttZqmjy79LSc5hLJnguqnSYVGcBWWRhYDomtXGLJbclnTLg5lk1qcUMWYnlfDuFDBXSLZxcHvhbHp4ZPRbAPK1X4mY99+9rH4mvs1UM0ahHY4/BFHoSWRprAg268OVD9dgJzg74oE9f7aeaWgLjuNOc/NqghTap7e2U8utfPCJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=K6toPVym; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="K6toPVym" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 36FFAC19425; Tue, 24 Feb 2026 16:35:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950920; bh=ivti1BQ26HIJUTFrd9/bBB/3cN9Zh/5oIInGVfOpYF8=; h=Date:From:To:Cc:Subject:References:From; b=K6toPVymR0gcZ11eJpz9IYwxWTI7jJTRxOBix1bll0fqeBKQMko5ZsQFROrDpOF2i H8VFhiVcKZt+S7x3qFG1u/v63zEwCMe10Gf0ITlXEopwBJMladegxO/JfGZxY9Pajg Idq7yIaA3D3PSJsSFsLyfAUPwhBzHg56RI5CxKJdeGXHrWLNkWKv+1Q2DgYIRyP2y3 HY7A/g2+MTLxjz076ghxJ6uxTYr7yE7QzrB0LFhilRMRGg/1nfCPpARqupEzUoM4ms NCPHc0G3T3ltkIj+HI/dfSz41EELM1Ik7xVlJWlRbQAjEmdFnf+C2pQhQo1FE3NHlZ bclO0RLUlyMGA== Date: Tue, 24 Feb 2026 17:35:17 +0100 Message-ID: <20260224163428.798198874@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 01/48] sched/eevdf: Fix HRTICK duration References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The nominal duration for an EEVDF task to run is until its deadline. At which point the deadline is moved ahead and a new task selection is done. Try and predict the time 'lost' to higher scheduling classes. Since this is an estimate, the timer can be both early or late. In case it is early task_tick_fair() will take the !need_resched() path and restarts the timer. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Reviewed-by: Juri Lelli Acked-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6735,21 +6735,37 @@ static inline void sched_fair_update_sto static void hrtick_start_fair(struct rq *rq, struct task_struct *p) { struct sched_entity *se =3D &p->se; + unsigned long scale =3D 1024; + unsigned long util =3D 0; + u64 vdelta; + u64 delta; =20 WARN_ON_ONCE(task_rq(p) !=3D rq); =20 - if (rq->cfs.h_nr_queued > 1) { - u64 ran =3D se->sum_exec_runtime - se->prev_sum_exec_runtime; - u64 slice =3D se->slice; - s64 delta =3D slice - ran; - - if (delta < 0) { - if (task_current_donor(rq, p)) - resched_curr(rq); - return; - } - hrtick_start(rq, delta); + if (rq->cfs.h_nr_queued <=3D 1) + return; + + /* + * Compute time until virtual deadline + */ + vdelta =3D se->deadline - se->vruntime; + if ((s64)vdelta < 0) { + if (task_current_donor(rq, p)) + resched_curr(rq); + return; } + delta =3D (se->load.weight * vdelta) / NICE_0_LOAD; + + /* + * Correct for instantaneous load of other classes. + */ + util +=3D cpu_util_irq(rq); + if (util && util < 1024) { + scale *=3D 1024; + scale /=3D (1024 - util); + } + + hrtick_start(rq, (scale * delta) / 1024); } =20 /* @@ -13365,11 +13381,8 @@ static void task_tick_fair(struct rq *rq entity_tick(cfs_rq, se, queued); } =20 - if (queued) { - if (!need_resched()) - hrtick_start_fair(rq, curr); + if (queued) return; - } =20 if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74B7F3A1CFF for ; Tue, 24 Feb 2026 16:35:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950926; cv=none; b=Qn4Mfjy1OrmuAavNrQI6YH+VPxD+ASidrj9BJvCexYqIFqznLqatmzFkzPvFyLAcjiG6PBuY4bkpaKpdoVC5eOreGi2D++9eyHhdrFttIrgHMF6WwZgCYaMht7oC5KEhcQd3gNEV6R+7ME+SIKDCfg6cNLzfWgBBEKg2TaZ2R1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950926; c=relaxed/simple; bh=rMZAJWrhL8BgmOoOgH8dEGVxSuL5IPlNY+fpOUJAc14=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=i0Ilmra79mCT/mmpbQKMMLnCR71zUBYfl0t7NHoIbZLXobArT7O7OEeK9y0XdnZSLZAmC2QjnYw/YVfB/0yzC7Tt+q0oT61/7cYDIfSXkXAHFCqfNqHnXyFzpCCy398u9+s0//ZZQjHc0V95X8Las54wIUGCJ/Jz9ws4DnBRoMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Dn5j9WJm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Dn5j9WJm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6D66DC19423; Tue, 24 Feb 2026 16:35:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950926; bh=rMZAJWrhL8BgmOoOgH8dEGVxSuL5IPlNY+fpOUJAc14=; h=Date:From:To:Cc:Subject:References:From; b=Dn5j9WJmTm82gvrapb2eqLzaPh1ZUoZl9nkBIw0E7VYs2ZjOFAGidGC+149Jw2ZVH SfGiQQwWE3mfCsuzCkwryCSaTZYr+IFrwQjHOgr5Rf8M++iZ3Yfm+ZajmSJ2VI5j// H6Mqcw/viPQkfNO2jpm82rsDpYmBtAgymDlUuKiwUfO+DSedda7XHC7hxVJoVCvtI2 TheSkZw58te+dsdIC85zC0qtvhDZMWk32VJWiRVKD/plU//mz2+7lbVYugYxgOYPA+ Z6gHy0euHJKytILqmr66Wa5XH4SFcbHM1QohERCgasjEXhfWqlg0uR6rqwDZjcTdv5 P/Zsi/1IxnPkA== Date: Tue, 24 Feb 2026 17:35:22 +0100 Message-ID: <20260224163428.866374835@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 02/48] sched/fair: Simplify hrtick_update() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra (Intel) hrtick_update() was needed when the slice depended on nr_running, all that code is gone. All that remains is starting the hrtick when nr_running becomes more than 1. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- --- kernel/sched/fair.c | 12 ++++-------- kernel/sched/sched.h | 4 ++++ 2 files changed, 8 insertions(+), 8 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6769,9 +6769,7 @@ static void hrtick_start_fair(struct rq } =20 /* - * called from enqueue/dequeue and updates the hrtick when the - * current task is from our class and nr_running is low enough - * to matter. + * Called on enqueue to start the hrtick when h_nr_queued becomes more tha= n 1. */ static void hrtick_update(struct rq *rq) { @@ -6780,6 +6778,9 @@ static void hrtick_update(struct rq *rq) if (!hrtick_enabled_fair(rq) || donor->sched_class !=3D &fair_sched_class) return; =20 + if (hrtick_active(rq)) + return; + hrtick_start_fair(rq, donor); } #else /* !CONFIG_SCHED_HRTICK: */ @@ -7102,9 +7103,6 @@ static int dequeue_entities(struct rq *r WARN_ON_ONCE(!task_sleep); WARN_ON_ONCE(p->on_rq !=3D 1); =20 - /* Fix-up what dequeue_task_fair() skipped */ - hrtick_update(rq); - /* * Fix-up what block_task() skipped. * @@ -7138,8 +7136,6 @@ static bool dequeue_task_fair(struct rq /* * Must not reference @p after dequeue_entities(DEQUEUE_DELAYED). */ - - hrtick_update(rq); return true; } =20 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3041,6 +3041,10 @@ static inline int hrtick_enabled_dl(stru } =20 extern void hrtick_start(struct rq *rq, u64 delay); +static inline bool hrtick_active(struct rq *rq) +{ + return hrtimer_active(&rq->hrtick_timer); +} =20 #else /* !CONFIG_SCHED_HRTICK: */ From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 219233A7821 for ; Tue, 24 Feb 2026 16:35:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950931; cv=none; b=a1dq6XGEmFyKerqMl6cLWGT4wQehJRrnfls7FutuLosMIVRTKmnYXjvbw6O8EjNxyLCYxYGtnhLsDtKzIA23a0G0laq1G3aCp4Efu85DFi0kYXBaMjmklj8C0BbQB+H801wG7Qvw4zHHJnuK0x2cIUtExXNrvZQ9Lj5o6zfaBJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950931; c=relaxed/simple; bh=rP4kdpWUWonjltYeg4/7VkwdegiY8c1wuy5eBORUYdk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DhOIPxZJJ2TvEgGrT9068BET4FhndEQFKufBXZ4U4+OQlGGHji5u78ydOa0NXGsaqm/PFZNRF0cm5+MfA/Xd27YYFCBh8Bg1yH4AMYpAEHYSyqbl3SwwpdALZH/Q3XbRgLTX7kYozoQRUZVWFbIYbL3CQHL6gIihhjYBYRU6ulo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eY4U3I57; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eY4U3I57" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27D99C116D0; Tue, 24 Feb 2026 16:35:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950930; bh=rP4kdpWUWonjltYeg4/7VkwdegiY8c1wuy5eBORUYdk=; h=Date:From:To:Cc:Subject:References:From; b=eY4U3I57tT7Ov3XSV9jeUa4+rEPf38raotr3bg2YHwNshzO7msnMahtAcPwkgGpcr jx7LUsjI+Zz2o1uHUz31HiRYoS3hrlhs656UIk9xs4CWaO+DjcOZBvc744bqy4z3TR qClmp50J5fFCVeX+d2Krdln9wZKroxP69Ry73Cpy0pNUXaU8srP1/QPeH+3iRRvgNp 50bFr/XbwOzSabVLZ4010rGDqar/ezgC+LY8JiqRZbnS0L9XMj45uY8k9p5y+RoNKu 0dkBLfUb3kPQqR9+4XFQRD4WNCtC7LCA9wAwxoBFY21uqHGHhHyFBmkKsyeRkk4KR9 vyt1frnLCVagw== Date: Tue, 24 Feb 2026 17:35:27 +0100 Message-ID: <20260224163428.933894105@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 03/48] sched/fair: Make hrtick resched hard References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra (Intel) Since the tick causes hard preemption, the hrtick should too. Letting the hrtick do lazy preemption completely defeats the purpose, since it will then still be delayed until a old tick and be dependent on CONFIG_HZ. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5530,7 +5530,7 @@ entity_tick(struct cfs_rq *cfs_rq, struc * validating it and just reschedule. */ if (queued) { - resched_curr_lazy(rq_of(cfs_rq)); + resched_curr(rq_of(cfs_rq)); return; } #endif From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2887C3A7840 for ; Tue, 24 Feb 2026 16:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950936; cv=none; b=QOu7NcwHMSPH4ARrCXOlpSyGinwtSU1YLUCdhT3zlydi/BcFtZd58xQWIU4bDrhC2W1K17FXnIv880QKDptYB8kLCEkRBWckd4qfO4VgxahFlfqz20Z5ykDO9e9AdetaWGj5sg6eQnpWFtkcgg9lb9n/4N91mFKYei5WaV/cG+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950936; c=relaxed/simple; bh=eJ87CMH89ykbbm7nG6HB5qQBHC5/mNl/0SPW1KF+zUw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=uRC79CGdXR4097aDKmWGbhw9R5HQ4e7TtJvcZLaFH455VuHYDCbORr5agLsAlw0cfen9+KPcBn8B/EAtWzweHtGiGoeA6wf1t3lc9pGDwcN/C7HPA/nxBRh/1xuhgBDVC/aohLrP7lwKA0PV6kgNKixDaadLKuN7wscvaGfNs+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hpflqh43; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hpflqh43" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47591C116D0; Tue, 24 Feb 2026 16:35:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950935; bh=eJ87CMH89ykbbm7nG6HB5qQBHC5/mNl/0SPW1KF+zUw=; h=Date:From:To:Cc:Subject:References:From; b=hpflqh43JxqU4CLdS5G9nbE1W4o6pw/dX1K+oaxDzbq7xEU36ofXNphoLRrLfBD5+ /ybT0aCTE8250xnFQJmrCHVwzpORZh9NRQjskH/FDaBZbxXInCNcK0IQTCMhWNhDop VfL37ETTdJcyUKT+EcH6MQ6efUAgx4bUya8PWYEkFgzil1jZ2hU/84vYfnqKSNB9ie YOVtp1BE9ham2P2c7/suk7y+6y9RC8fjRPyB6bfYu/gcEEfaXnykor4CMN7mzxTAj3 tRNiQu9+IOapLqmfjzHHJkWgjibf/pnFBMbH25YzBf7a2t5JvnyHuDjnlgsP2ukbEW aOe/nkUTpewDQ== Date: Tue, 24 Feb 2026 17:35:32 +0100 Message-ID: <20260224163429.001511662@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 04/48] sched: Avoid ktime_get() indirection References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The clock of the hrtick and deadline timers is known to be CLOCK_MONOTONIC. No point in looking it up via hrtimer_cb_get_time(). Just use ktime_get() directly. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 3 +-- kernel/sched/deadline.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -925,7 +925,6 @@ static void __hrtick_start(void *arg) */ void hrtick_start(struct rq *rq, u64 delay) { - struct hrtimer *timer =3D &rq->hrtick_timer; s64 delta; =20 /* @@ -933,7 +932,7 @@ void hrtick_start(struct rq *rq, u64 del * doesn't make sense and can cause timer DoS. */ delta =3D max_t(s64, delay, 10000LL); - rq->hrtick_time =3D ktime_add_ns(hrtimer_cb_get_time(timer), delta); + rq->hrtick_time =3D ktime_add_ns(ktime_get(), delta); =20 if (rq =3D=3D this_rq()) __hrtick_restart(rq); --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1097,7 +1097,7 @@ static int start_dl_timer(struct sched_d act =3D ns_to_ktime(dl_next_period(dl_se)); } =20 - now =3D hrtimer_cb_get_time(timer); + now =3D ktime_get(); delta =3D ktime_to_ns(now) - rq_clock(rq); act =3D ktime_add_ns(act, delta); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F033E3A1CFF for ; Tue, 24 Feb 2026 16:35:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950941; cv=none; b=P5oZkTIlJq5SBPhxgBh+qb/eXvt2LG4kT5ho6O8s5KPBqF9lVYKT3Rqulp/gzHvJMUvfjd+YrJxo2ZSKOC+Ym4OMV6zrPQi2al9koWKhqidHIrW0I/Nzh4rgGaUkCUMc+0hhC+qWuoUkoEDBD2ksiJcov9Qs9ry42+WLBYKRU8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950941; c=relaxed/simple; bh=yUtATDA3cbL1gcCEITRc3Bt1XOGltOBpJJug0T+dz2M=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=d6y3ot4zv9ka9/E9xGBceZSy1UjGJDKmhB8+Sxpc2sK0b7eWpizjmrVJ27YmlX6GlUtMdVD1N6DabqZ+pjADreelvQcq83upCkL2ZMVKySzKJaUBGHl8hBtNYi0qqHodfQENr363scNbDN9Rg/zAh6A0vU2yxb+4NAGxU2Ko+Sw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oRuFTbGB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oRuFTbGB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48816C116D0; Tue, 24 Feb 2026 16:35:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950940; bh=yUtATDA3cbL1gcCEITRc3Bt1XOGltOBpJJug0T+dz2M=; h=Date:From:To:Cc:Subject:References:From; b=oRuFTbGBpxNOP7Q6/yqQaWtuztWnsLcW8pE7XvWfmTQc3oCEfJaN0aZ/cFuE7Hknd evqZ1toLG16JTccMwhwGgkO8g4FcUjAQ243UlqxqhIrx3TiUP+AMUEmmkMKXV1JCgm wd0N0rFK2JGOERydWAYnVMGgc6bPk+z/yR51Odj8Ee+yhDoCywRFHoxQUhZqtIV+yZ S+AfeA3u6cckh6a0eBBLnhWFzWqWzSmxRp4vrVegv2vATgNCMAcMQBEfyPd0upcAZ7 UdcO6bP9cLe8JQtQemys7Wh1Rkc02Thz0Tv8areelRmc6+A+Y8JUWvG+AwkDl4s6Mr ojQGFB7FgTkug== Date: Tue, 24 Feb 2026 17:35:37 +0100 Message-ID: <20260224163429.069535561@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 05/48] hrtimer: Avoid pointless reprogramming in __hrtimer_start_range_ns() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Much like hrtimer_reprogram(), skip programming if the cpu_base is running the hrtimer interrupt. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Reviewed-by: Juri Lelli Reviewed-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1269,6 +1269,14 @@ static int __hrtimer_start_range_ns(stru } =20 first =3D enqueue_hrtimer(timer, new_base, mode); + + /* + * If the hrtimer interrupt is running, then it will reevaluate the + * clock bases and reprogram the clock event device. + */ + if (new_base->cpu_base->in_hrtirq) + return false; + if (!force_local) { /* * If the current CPU base is online, then the timer is From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17D683A63EB for ; Tue, 24 Feb 2026 16:35:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950946; cv=none; b=ovRN6iAZHH6tY+iGZSMe6q6k3EPQBNvc4BIn3VKfgFqtfit/K+hi3wd9cvDSK/Cd+rkwx+pDSqBch0/iQdvACbi9Qo4Mx8y+YH/beVVx1XAPB+lPzz1oCp/t/pb+nK3o8D5YHl/O+RMeI23DgiUzFXLhUhaAaQ6aEKta/cU3mok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950946; c=relaxed/simple; bh=7VcBZOAgh8wNLFHKurdD4NQaXgXYLVnDRuA9eKvv6i0=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qw0L2RteIv9PIcnvK1jnPWtCk68sPLzI6b/baJHyhVrDxPQRyi0NeZ3Fq0Uiy8a8/STrFE/5cbYyqZkknZye4G5nbyyTI0ZfdMWIlt21Pfds6s8KtrPxlkh7i0EDY/m+eZqbwnwcsSawb4YScwi2PilZBAeG7/ffcUQgEIYcM2w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gYmKY+4t; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gYmKY+4t" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C078C116D0; Tue, 24 Feb 2026 16:35:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950946; bh=7VcBZOAgh8wNLFHKurdD4NQaXgXYLVnDRuA9eKvv6i0=; h=Date:From:To:Cc:Subject:References:From; b=gYmKY+4tX2ZRjY0TnLlvu5D/3VuCOneS6o/s+hvXOST1UpHL8w12jtK8q1Lszne4h yY8emZM7PScs7p16XS8EU9LMXE690ou2RnvDyiEOjmPnJM9nAPPYxejQvel3O9e47U ByJ1MKBRaL1RreO+vkErO7EJYIcxIfvUx6/coca5o9UpoxFz/ifgjxPd/DpbFlHF16 eQYSRphbJ2jPiSq9klkHA5QF7LlAOb3cX5UyoBNhu2QV2onLVP2p6Mk6gNuDFsAWTd EFksgGAGYJkSb1bMBnpZM7L4EgnB841yuDi/K+P2jnHabwQrtk6AhScxsG0rBbk5bb ARU7L3U7WH3EA== Date: Tue, 24 Feb 2026 17:35:42 +0100 Message-ID: <20260224163429.136503358@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 06/48] hrtimer: Provide a static branch based hrtimer_hres_enabled() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The scheduler evaluates this via hrtimer_is_hres_active() every time it has to update HRTICK. This needs to follow three pointers, which is expensive. Provide a static branch based mechanism to avoid that. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer.h | 13 +++++++++---- kernel/time/hrtimer.c | 28 +++++++++++++++++++++++++--- 2 files changed, 34 insertions(+), 7 deletions(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -153,17 +153,22 @@ static inline int hrtimer_is_hres_active } =20 #ifdef CONFIG_HIGH_RES_TIMERS +extern unsigned int hrtimer_resolution; struct clock_event_device; =20 extern void hrtimer_interrupt(struct clock_event_device *dev); =20 -extern unsigned int hrtimer_resolution; +extern struct static_key_false hrtimer_highres_enabled_key; =20 -#else +static inline bool hrtimer_highres_enabled(void) +{ + return static_branch_likely(&hrtimer_highres_enabled_key); +} =20 +#else /* CONFIG_HIGH_RES_TIMERS */ #define hrtimer_resolution (unsigned int)LOW_RES_NSEC - -#endif +static inline bool hrtimer_highres_enabled(void) { return false; } +#endif /* !CONFIG_HIGH_RES_TIMERS */ =20 static inline ktime_t __hrtimer_expires_remaining_adjusted(const struct hrtimer *timer, ktime_t = now) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -126,6 +126,25 @@ static inline bool hrtimer_base_is_onlin return likely(base->online); } =20 +#ifdef CONFIG_HIGH_RES_TIMERS +DEFINE_STATIC_KEY_FALSE(hrtimer_highres_enabled_key); + +static void hrtimer_hres_workfn(struct work_struct *work) +{ + static_branch_enable(&hrtimer_highres_enabled_key); +} + +static DECLARE_WORK(hrtimer_hres_work, hrtimer_hres_workfn); + +static inline void hrtimer_schedule_hres_work(void) +{ + if (!hrtimer_highres_enabled()) + schedule_work(&hrtimer_hres_work); +} +#else +static inline void hrtimer_schedule_hres_work(void) { } +#endif + /* * Functions and macros which are different for UP/SMP systems are kept in= a * single place @@ -649,7 +668,9 @@ static inline ktime_t hrtimer_update_bas } =20 /* - * Is the high resolution mode active ? + * Is the high resolution mode active in the CPU base. This cannot use the + * static key as the CPUs are switched to high resolution mode + * asynchronously. */ static inline int hrtimer_hres_active(struct hrtimer_cpu_base *cpu_base) { @@ -750,6 +771,7 @@ static void hrtimer_switch_to_hres(void) tick_setup_sched_timer(true); /* "Retrigger" the interrupt to get things going */ retrigger_next_event(NULL); + hrtimer_schedule_hres_work(); } =20 #else @@ -947,11 +969,10 @@ static bool update_needs_ipi(struct hrti */ void clock_was_set(unsigned int bases) { - struct hrtimer_cpu_base *cpu_base =3D raw_cpu_ptr(&hrtimer_bases); cpumask_var_t mask; int cpu; =20 - if (!hrtimer_hres_active(cpu_base) && !tick_nohz_is_active()) + if (!hrtimer_highres_enabled() && !tick_nohz_is_active()) goto out_timerfd; =20 if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) { @@ -962,6 +983,7 @@ void clock_was_set(unsigned int bases) /* Avoid interrupting CPUs if possible */ cpus_read_lock(); for_each_online_cpu(cpu) { + struct hrtimer_cpu_base *cpu_base; unsigned long flags; =20 cpu_base =3D &per_cpu(hrtimer_bases, cpu); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C41162080C1 for ; Tue, 24 Feb 2026 16:35:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950950; cv=none; b=fMUPcz1ueHrCRj8W2QSRo76HV2jgNhg49fvkAREq86nyP1YUTgRRJNWESnpcGJGZH5d+5+IJS5QiyqXv+LfvxLAi2X20jwdQup8ELcWx/7yQ3WwOTjJCTibUnyDMzhnqbBBN8Zqrd31Tqa7SdXoJGwOWWQY5tg7DIO5dEDFhBrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950950; c=relaxed/simple; bh=p4FEaiZI7ZWb048EpqwN5p7D4Tr8hag9vjLN+aynbho=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=iLC/W/uXBrfI5EgYFGqglgIvIvJFToNrEk4NPClLjv2vWf85f7n9+JrtZ4eebF4g6yluW3OqSGMS/RRS0s7gUDkLpaZdCZpRmqMusIcB/i94oHvKsNrCY+tRnbT866XGMmNQkVyVYQI9KELQuEX1CmjBTSz6xRB8YfCrcQ9WV3g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uiWxFhzL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uiWxFhzL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 002E0C116D0; Tue, 24 Feb 2026 16:35:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950950; bh=p4FEaiZI7ZWb048EpqwN5p7D4Tr8hag9vjLN+aynbho=; h=Date:From:To:Cc:Subject:References:From; b=uiWxFhzLzFqVmsiGzipgZ3HEsT8VLcp1+koehHxWDOY5dKACvn6y8B2GM0586TKQY DBYRTuk4Flo85L4V6dEKqgllo+aaoqukPFIP0e0p0gtWlgrjk86nROvXDaMsP9A/Yj BIcgoZfPAxCWl0st5TpmRUyIM7PS1dAeEt0XpEuWrCV96apt/svMHHx8dxPNpzsUgT qOq8UpjHUdVeArlKhDqR/S+UEAJVT9MZUrBOJ3wYVjn6bewtrsD2GP2DljngXK656h WhWgeDLiD5eq36znF8/G9OQr7j0MbjXiO0dU5kNx2SkcfZ4JMMwXgGtn+IPhXRjLom 6veZJFeqm9wPg== Date: Tue, 24 Feb 2026 17:35:47 +0100 Message-ID: <20260224163429.203610956@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 07/48] sched: Use hrtimer_highres_enabled() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the static branch based variant and thereby avoid following three pointers. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer.h | 6 ------ kernel/sched/sched.h | 37 +++++++++---------------------------- 2 files changed, 9 insertions(+), 34 deletions(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -146,12 +146,6 @@ static inline ktime_t hrtimer_expires_re return ktime_sub(timer->node.expires, hrtimer_cb_get_time(timer)); } =20 -static inline int hrtimer_is_hres_active(struct hrtimer *timer) -{ - return IS_ENABLED(CONFIG_HIGH_RES_TIMERS) ? - timer->base->cpu_base->hres_active : 0; -} - #ifdef CONFIG_HIGH_RES_TIMERS extern unsigned int hrtimer_resolution; struct clock_event_device; --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3019,25 +3019,19 @@ extern unsigned int sysctl_numa_balancin * - enabled by features * - hrtimer is actually high res */ -static inline int hrtick_enabled(struct rq *rq) +static inline bool hrtick_enabled(struct rq *rq) { - if (!cpu_active(cpu_of(rq))) - return 0; - return hrtimer_is_hres_active(&rq->hrtick_timer); + return cpu_active(cpu_of(rq)) && hrtimer_highres_enabled(); } =20 -static inline int hrtick_enabled_fair(struct rq *rq) +static inline bool hrtick_enabled_fair(struct rq *rq) { - if (!sched_feat(HRTICK)) - return 0; - return hrtick_enabled(rq); + return sched_feat(HRTICK) && hrtick_enabled(rq); } =20 -static inline int hrtick_enabled_dl(struct rq *rq) +static inline bool hrtick_enabled_dl(struct rq *rq) { - if (!sched_feat(HRTICK_DL)) - return 0; - return hrtick_enabled(rq); + return sched_feat(HRTICK_DL) && hrtick_enabled(rq); } =20 extern void hrtick_start(struct rq *rq, u64 delay); @@ -3047,22 +3041,9 @@ static inline bool hrtick_active(struct } =20 #else /* !CONFIG_SCHED_HRTICK: */ - -static inline int hrtick_enabled_fair(struct rq *rq) -{ - return 0; -} - -static inline int hrtick_enabled_dl(struct rq *rq) -{ - return 0; -} - -static inline int hrtick_enabled(struct rq *rq) -{ - return 0; -} - +static inline bool hrtick_enabled_fair(struct rq *rq) { return false; } +static inline bool hrtick_enabled_dl(struct rq *rq) { return false; } +static inline bool hrtick_enabled(struct rq *rq) { return false; } #endif /* !CONFIG_SCHED_HRTICK */ =20 #ifndef arch_scale_freq_tick From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48F343A7840 for ; Tue, 24 Feb 2026 16:35:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950955; cv=none; b=fwRQByWmOmu8qPXlyCQNywDK64lggUOq3ohocwVL9fOMcuQ9Am4OlR24Bu2vTVJg0a6atu5gg/5ltqURbiA3Js25d5eSaETxijJxzlVKfki77GCQoM+yFMpYpvlnVs0SkVqmmuPbE304Sj1wu6jNsIVpHjkovsFUZ+c8BDDWNSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950955; c=relaxed/simple; bh=c6xVVCFIB2PCG4ORqRmwG4mHZWTxBNGMZZqzR6Eyiw8=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=kOTtKcl9Z28rhpuvFz5GiZl4gd3RJqXzlw77CLOY9k4VNWK0mFCAz5d9Nr6Bkx8YKcn9ETIuHEb6gsNvPRLdGWkaJQp/9PLrXh7Ah69zd62nsp0A7ikglE4ko7/EbHsSuejwBZeGRXFSK10jPJE0b8mZskf3iPT+xwRY7xqPCpE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uanKlBIf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uanKlBIf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48770C116D0; Tue, 24 Feb 2026 16:35:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950954; bh=c6xVVCFIB2PCG4ORqRmwG4mHZWTxBNGMZZqzR6Eyiw8=; h=Date:From:To:Cc:Subject:References:From; b=uanKlBIfpPRGJvzgwo6f93bmTR+m/MMnuuCpdWnhYGsbNZKNouJeCLZgsoPlOYWUv UOWxSRonegaPJ8BAWNivFEjLS1bRR0dXhP1HdofkfDmd1iLtiQ8QiDmnGSGoWgRvec PL1V3g444f6Krtt4ZoKBX+77wGg7RWC+yD60tLdTDEYSyhWAdzCa1eB8SEtQfmxGux LXTm4a/6S2vix4zxjb4PNzXM5EbTJjdJdtAAmUEejdt5Ivlas3/0nhe1U6zkkERkDg kW2Lv74VZwoyWTwwL68ODBTXMvRq/7zCeaCcL89d+h6JtPdkwWTAIA9gsZKPQiaCWC kh0Kh5pGBDx1w== Date: Tue, 24 Feb 2026 17:35:52 +0100 Message-ID: <20260224163429.273068659@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 08/48] sched: Optimize hrtimer handling References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" schedule() provides several mechanisms to update the hrtick timer: 1) When the next task is picked 2) When the balance callbacks are invoked before rq::lock is released Each of them can result in a first expiring timer and cause a reprogram of the clock event device. Solve this by deferring the rearm to the end of schedule() right before releasing rq::lock by setting a flag on entry which tells hrtick_start() to cache the runtime constraint in rq::hrtick_delay without touching the timer itself. Right before releasing rq::lock evaluate the flags and either rearm or cancel the hrtick timer. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 57 ++++++++++++++++++++++++++++++++++++++++++----= ----- kernel/sched/sched.h | 2 + 2 files changed, 50 insertions(+), 9 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -872,6 +872,12 @@ void update_rq_clock(struct rq *rq) * Use HR-timers to deliver accurate preemption points. */ =20 +enum { + HRTICK_SCHED_NONE =3D 0, + HRTICK_SCHED_DEFER =3D BIT(1), + HRTICK_SCHED_START =3D BIT(2), +}; + static void hrtick_clear(struct rq *rq) { if (hrtimer_active(&rq->hrtick_timer)) @@ -932,6 +938,17 @@ void hrtick_start(struct rq *rq, u64 del * doesn't make sense and can cause timer DoS. */ delta =3D max_t(s64, delay, 10000LL); + + /* + * If this is in the middle of schedule() only note the delay + * and let hrtick_schedule_exit() deal with it. + */ + if (rq->hrtick_sched) { + rq->hrtick_sched |=3D HRTICK_SCHED_START; + rq->hrtick_delay =3D delta; + return; + } + rq->hrtick_time =3D ktime_add_ns(ktime_get(), delta); =20 if (rq =3D=3D this_rq()) @@ -940,19 +957,40 @@ void hrtick_start(struct rq *rq, u64 del smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd); } =20 -static void hrtick_rq_init(struct rq *rq) +static inline void hrtick_schedule_enter(struct rq *rq) { - INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq); - hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, HRTIMER_MODE_RE= L_HARD); + rq->hrtick_sched =3D HRTICK_SCHED_DEFER; } -#else /* !CONFIG_SCHED_HRTICK: */ -static inline void hrtick_clear(struct rq *rq) + +static inline void hrtick_schedule_exit(struct rq *rq) { + if (rq->hrtick_sched & HRTICK_SCHED_START) { + rq->hrtick_time =3D ktime_add_ns(ktime_get(), rq->hrtick_delay); + __hrtick_restart(rq); + } else if (idle_rq(rq)) { + /* + * No need for using hrtimer_is_active(). The timer is CPU local + * and interrupts are disabled, so the callback cannot be + * running and the queued state is valid. + */ + if (hrtimer_is_queued(&rq->hrtick_timer)) + hrtimer_cancel(&rq->hrtick_timer); + } + + rq->hrtick_sched =3D HRTICK_SCHED_NONE; } =20 -static inline void hrtick_rq_init(struct rq *rq) +static void hrtick_rq_init(struct rq *rq) { + INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq); + rq->hrtick_sched =3D HRTICK_SCHED_NONE; + hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, HRTIMER_MODE_RE= L_HARD); } +#else /* !CONFIG_SCHED_HRTICK: */ +static inline void hrtick_clear(struct rq *rq) { } +static inline void hrtick_rq_init(struct rq *rq) { } +static inline void hrtick_schedule_enter(struct rq *rq) { } +static inline void hrtick_schedule_exit(struct rq *rq) { } #endif /* !CONFIG_SCHED_HRTICK */ =20 /* @@ -5028,6 +5066,7 @@ static inline void finish_lock_switch(st */ spin_acquire(&__rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_); __balance_callbacks(rq, NULL); + hrtick_schedule_exit(rq); raw_spin_rq_unlock_irq(rq); } =20 @@ -6781,9 +6820,6 @@ static void __sched notrace __schedule(i =20 schedule_debug(prev, preempt); =20 - if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) - hrtick_clear(rq); - klp_sched_try_switch(prev); =20 local_irq_disable(); @@ -6810,6 +6846,8 @@ static void __sched notrace __schedule(i rq_lock(rq, &rf); smp_mb__after_spinlock(); =20 + hrtick_schedule_enter(rq); + /* Promote REQ to ACT */ rq->clock_update_flags <<=3D 1; update_rq_clock(rq); @@ -6911,6 +6949,7 @@ static void __sched notrace __schedule(i =20 rq_unpin_lock(rq, &rf); __balance_callbacks(rq, NULL); + hrtick_schedule_exit(rq); raw_spin_rq_unlock_irq(rq); } trace_sched_exit_tp(is_switch); --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1285,6 +1285,8 @@ struct rq { call_single_data_t hrtick_csd; struct hrtimer hrtick_timer; ktime_t hrtick_time; + ktime_t hrtick_delay; + unsigned int hrtick_sched; #endif =20 #ifdef CONFIG_SCHEDSTATS From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EBAE3A8FF6 for ; Tue, 24 Feb 2026 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950960; cv=none; b=M4ir3h7mw44+mK1LbJ1zVFdVQk6gGfnrsz0CsOXDMEij70MinnL49OOKQSR2CC3rt5NLRl/zZGByA7uT+6Nqd1UiKlzL7XS2djTZgyMBOj0F4nowMDGGgKcn5n6sDd0nef2KYR0uAtlyQia705L1SCFuM+9fPJPd/LJRH19AjvE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950960; c=relaxed/simple; bh=w/08wrYAvr+mim8LiwNBkKTRYSiA2d8XE+0nv6KRdLs=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EiYa1Gn2omKo5WBKUqEYjAzMC9U5vR6yXsmrKEiE17l6vWOTBhwToFSe73wZ/MMy3jYUBADMihCJvicvy9FVVA1nYEcmO+PUK0yjnC2asf7JpyTyuvAuVaAuDjtcSzhihAFAm3jvrCdXmIV8WQW6nPZbb00FxExcmK24ZKhhh7Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jaZRIT66; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jaZRIT66" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3B193C19423; Tue, 24 Feb 2026 16:35:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950959; bh=w/08wrYAvr+mim8LiwNBkKTRYSiA2d8XE+0nv6KRdLs=; h=Date:From:To:Cc:Subject:References:From; b=jaZRIT66+OUPx/P3HpKEjqVioAbnBmt7DIPWjPMrJO3kDGDgowTaQNZJgCJWApB2T 6r7ZAJgf03PPjgb9LZ5BztG81JyQ4SbguBxTefQQjQN7qA9jhmLdgEGAftjx4pZlmI KI14i/z4HW+/rMPRwjZdxWp3QlHl1+HfwHUYiwR3oVqsqTGmcAACj+v6LnTwpHWh8C C2jjsMbEezkA+4mxxmL3SVafntqZVgQuXnp1oK9lDhVRXOmcu2f5phI4K0QthnFse4 Byizn9qx83Utz4c5by5O1vxiGSOktRdthgb6//7lLxU/lB4wLcjnZ212uLW6zeeh2x oi+7jgno7gdLg== Date: Tue, 24 Feb 2026 17:35:56 +0100 Message-ID: <20260224163429.340593047@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 09/48] sched/hrtick: Avoid tiny hrtick rearms References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Tiny adjustments to the hrtick expiry time below 5 microseconds are just causing extra work for no real value. Filter them out when restarting the hrtick. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -903,12 +903,24 @@ static enum hrtimer_restart hrtick(struc return HRTIMER_NORESTART; } =20 -static void __hrtick_restart(struct rq *rq) +static inline bool hrtick_needs_rearm(struct hrtimer *timer, ktime_t expir= es) +{ + /* + * Queued is false when the timer is not started or currently + * running the callback. In both cases, restart. If queued check + * whether the expiry time actually changes substantially. + */ + return !hrtimer_is_queued(timer) || + abs(expires - hrtimer_get_expires(timer)) > 5000; +} + +static void hrtick_cond_restart(struct rq *rq) { struct hrtimer *timer =3D &rq->hrtick_timer; ktime_t time =3D rq->hrtick_time; =20 - hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD); + if (hrtick_needs_rearm(timer, time)) + hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD); } =20 /* @@ -920,7 +932,7 @@ static void __hrtick_start(void *arg) struct rq_flags rf; =20 rq_lock(rq, &rf); - __hrtick_restart(rq); + hrtick_cond_restart(rq); rq_unlock(rq, &rf); } =20 @@ -950,9 +962,11 @@ void hrtick_start(struct rq *rq, u64 del } =20 rq->hrtick_time =3D ktime_add_ns(ktime_get(), delta); + if (!hrtick_needs_rearm(&rq->hrtick_timer, rq->hrtick_time)) + return; =20 if (rq =3D=3D this_rq()) - __hrtick_restart(rq); + hrtimer_start(&rq->hrtick_timer, rq->hrtick_time, HRTIMER_MODE_ABS_PINNE= D_HARD); else smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd); } @@ -966,7 +980,7 @@ static inline void hrtick_schedule_exit( { if (rq->hrtick_sched & HRTICK_SCHED_START) { rq->hrtick_time =3D ktime_add_ns(ktime_get(), rq->hrtick_delay); - __hrtick_restart(rq); + hrtick_cond_restart(rq); } else if (idle_rq(rq)) { /* * No need for using hrtimer_is_active(). The timer is CPU local From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 842C53A8FF0 for ; Tue, 24 Feb 2026 16:36:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950964; cv=none; b=UIxZCd5Gg7qzSm7VWU8sg2gkDvoZl2PyxpL9AwXrPAMjUN9vJR4m8H3KfAOC6KfjZRFswjgYtqdi0SRAiONzbPxn7ZyEMT49qknEV2sUwEophfbEbwktgYclFBckeS9QEx6rmvnZQyhwvxrUHHf0u21HvKqEOWyiLFwcpuhsBHw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950964; c=relaxed/simple; bh=+ua2eQXBa2sc2uLHPFLi/lCtwWivsmtXP6ORYrSE+q4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=FMu/BUSlvVao6u4gswT2GfFUQLMVaP5ew2CCJBpmB0zZbv9yZO302kGrswkuUXdVMNzsutXLvQ2vi+w4UsOOhPK7cy3LNL9C30GpMx55hyT2dOUntAOFloY0YBhCfV3vmdW9xpUgeBzI9PiKlwqc3OarMluJSKDWQ+gwK+mt6Fo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Llm4H6DG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Llm4H6DG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0407C116D0; Tue, 24 Feb 2026 16:36:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950964; bh=+ua2eQXBa2sc2uLHPFLi/lCtwWivsmtXP6ORYrSE+q4=; h=Date:From:To:Cc:Subject:References:From; b=Llm4H6DGRV4hm+Ju9aMgZ+vhxdmD6PYK+WgRN+HiRkmhKSkmNaPCPX/JUg/Kb+xn5 jBgbpafNSipcNKbYPeQDVyo/Dwb51XRJAKBbCqGHclA8xgw9ZgBRkHdNTTL4Qh9qhq 3vm3CunqOf7G53UPRieHYM7AzRoutgpZxHFOPy3tDqYGdALbelJp/nQYLh6/UNwr22 q2TX72Qp++evOvujNYOiRvIWq/pEMnNbmpsbPDh4Op0MqU4KqL3OeJCa1ejuhtJ4AY odnffWoGFiifwM+EYjJhnxxrwPjuSBfLuiq8YJ/1pjNC9Tvu9PXOhFEqPHFoVd91pE eC2xhtR3Iz6Ng== Date: Tue, 24 Feb 2026 17:36:01 +0100 Message-ID: <20260224163429.408524456@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 10/48] hrtimer: Provide LAZY_REARM mode References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtick timer is frequently rearmed before expiry and most of the time the new expiry is past the armed one. As this happens on every context switch it becomes expensive with scheduling heavy work loads especially in virtual machines as the "hardware" reprogamming implies a VM exit. Add a lazy rearm mode flag which skips the reprogamming if: 1) The timer was the first expiring timer before the rearm 2) The new expiry time is farther out than the armed time This avoids a massive amount of reprogramming operations of the hrtick timer for the price of eventually taking the alredy armed interrupt for nothing. Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer.h | 8 ++++++++ include/linux/hrtimer_types.h | 3 +++ kernel/time/hrtimer.c | 17 ++++++++++++++++- 3 files changed, 27 insertions(+), 1 deletion(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -31,6 +31,13 @@ * soft irq context * HRTIMER_MODE_HARD - Timer callback function will be executed in * hard irq context even on PREEMPT_RT. + * HRTIMER_MODE_LAZY_REARM - Avoid reprogramming if the timer was the + * first expiring timer and is moved into the + * future. Special mode for the HRTICK timer to + * avoid extensive reprogramming of the hardware, + * which is expensive in virtual machines. Risks + * a pointless expiry, but that's better than + * reprogramming on every context switch, */ enum hrtimer_mode { HRTIMER_MODE_ABS =3D 0x00, @@ -38,6 +45,7 @@ enum hrtimer_mode { HRTIMER_MODE_PINNED =3D 0x02, HRTIMER_MODE_SOFT =3D 0x04, HRTIMER_MODE_HARD =3D 0x08, + HRTIMER_MODE_LAZY_REARM =3D 0x10, =20 HRTIMER_MODE_ABS_PINNED =3D HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED, HRTIMER_MODE_REL_PINNED =3D HRTIMER_MODE_REL | HRTIMER_MODE_PINNED, --- a/include/linux/hrtimer_types.h +++ b/include/linux/hrtimer_types.h @@ -33,6 +33,8 @@ enum hrtimer_restart { * @is_soft: Set if hrtimer will be expired in soft interrupt context. * @is_hard: Set if hrtimer will be expired in hard interrupt context * even on RT. + * @is_lazy: Set if the timer is frequently rearmed to avoid updates + * of the clock event device * * The hrtimer structure must be initialized by hrtimer_setup() */ @@ -45,6 +47,7 @@ struct hrtimer { u8 is_rel; u8 is_soft; u8 is_hard; + u8 is_lazy; }; =20 #endif /* _LINUX_HRTIMER_TYPES_H */ --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1152,7 +1152,7 @@ static void __remove_hrtimer(struct hrti * an superfluous call to hrtimer_force_reprogram() on the * remote cpu later on if the same timer gets enqueued again. */ - if (reprogram && timer =3D=3D cpu_base->next_timer) + if (reprogram && timer =3D=3D cpu_base->next_timer && !timer->is_lazy) hrtimer_force_reprogram(cpu_base, 1); } =20 @@ -1322,6 +1322,20 @@ static int __hrtimer_start_range_ns(stru } =20 /* + * Special case for the HRTICK timer. It is frequently rearmed and most + * of the time moves the expiry into the future. That's expensive in + * virtual machines and it's better to take the pointless already armed + * interrupt than reprogramming the hardware on every context switch. + * + * If the new expiry is before the armed time, then reprogramming is + * required. + */ + if (timer->is_lazy) { + if (new_base->cpu_base->expires_next <=3D hrtimer_get_expires(timer)) + return 0; + } + + /* * Timer was forced to stay on the current CPU to avoid * reprogramming on removal and enqueue. Force reprogram the * hardware by evaluating the new first expiring timer. @@ -1675,6 +1689,7 @@ static void __hrtimer_setup(struct hrtim base +=3D hrtimer_clockid_to_base(clock_id); timer->is_soft =3D softtimer; timer->is_hard =3D !!(mode & HRTIMER_MODE_HARD); + timer->is_lazy =3D !!(mode & HRTIMER_MODE_LAZY_REARM); timer->base =3D &cpu_base->clock_base[base]; timerqueue_init(&timer->node); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DB343A784D for ; Tue, 24 Feb 2026 16:36:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950969; cv=none; b=qxX5DmwW8a2Bf5HMiLjHo3AQXE4o46hlXkYhv7TONJPiJBjUgLIx7YvMTqTn0mHdSCA9fVFkSX1gB3WQP47lEsLqVL+cnrurWnBZ6qeYn8xFylePn4V6WvHzbkeFMSTRnX78E9iC5KfxB/balmY13f75PgzYVUMDP5v0tJLTFJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950969; c=relaxed/simple; bh=sCGdU6vnxJTJkuw6tdgNFr8GGsntVWTC1+pVjYDnALk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=G/m8NPVeF0HARkvtmJOjCLQnz2khI+18PoTVP+w1rh2x43GpQN3p6kgerrhTVyXhuPZjTsW2FNu2ofattylNkbMdmstAKOXc43jqbJdkUEW8UnTGGoav9ZVshoFVOhU5jY2HtlQVsriixgbzLz0WK/cLkvhLn0hAXOSyMwHk2ME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M2HzJdIH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M2HzJdIH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F2AAC116D0; Tue, 24 Feb 2026 16:36:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950968; bh=sCGdU6vnxJTJkuw6tdgNFr8GGsntVWTC1+pVjYDnALk=; h=Date:From:To:Cc:Subject:References:From; b=M2HzJdIHzKRZRL09ilDae/ZruR0E6QP3JtQRraEm7U002gj9s9z3HPeYhO98hhSkN C4wQiWJxr6SWkMec0VUQkHDHIK4LP34J9KHPGmhAp76BKPwtSPd9iEt4odZHoL5c83 2jm3PGAbeqF4AOEVaN0qdFtCnlsCAlHlU4n+DH7Zh94MHT9JraZDBY7aJR/cG2DaU7 ce4ygu6yDDge5V1uJrAw+5jkPct+anlLDe01f7yC9w/kShbXnVjlOwy+tGXuA6pIGv a8xLOxlOoKcm2hM0ObXtkO535Ad0H7Z9riWf7gz454Pes9sBge93eccde/Nti2vIun VVIAp8sKj9WnQ== Date: Tue, 24 Feb 2026 17:36:06 +0100 Message-ID: <20260224163429.475409346@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 11/48] sched/hrtick: Mark hrtick timer LAZY_REARM References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtick timer is frequently rearmed before expiry and most of the time the new expiry is past the armed one. As this happens on every context switch it becomes expensive with scheduling heavy work loads especially in virtual machines as the "hardware" reprogamming implies a VM exit. hrtimer now provide a lazy rearm mode flag which skips the reprogamming if: 1) The timer was the first expiring timer before the rearm 2) The new expiry time is farther out than the armed time This avoids a massive amount of reprogramming operations of the hrtick timer for the price of eventually taking the alredy armed interrupt for nothing. Mark the hrtick timer accordingly. Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -998,7 +998,8 @@ static void hrtick_rq_init(struct rq *rq { INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq); rq->hrtick_sched =3D HRTICK_SCHED_NONE; - hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, HRTIMER_MODE_RE= L_HARD); + hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, + HRTIMER_MODE_REL_HARD | HRTIMER_MODE_LAZY_REARM); } #else /* !CONFIG_SCHED_HRTICK: */ static inline void hrtick_clear(struct rq *rq) { } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBEF53A7F70 for ; Tue, 24 Feb 2026 16:36:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950973; cv=none; b=PiIJHtoldJ8gXzweAn9VzzHlr+6lK4knHzTKc05xmPdktYWB16pm/Ihz98lpR8oZuXRjvsv2ToC5NjwT72Rcohke9GVYRqx/KwR90EJHYfXzUwAy8sfVh34yHSG/3oLQYIBRwhoAbCHSo2wMWASdJDlXT5+GAGjqS4BzdDn7MoM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950973; c=relaxed/simple; bh=nu9vMrdiR6vyLXbjcbF7j1PN1pBFhVJBtbJrOkS8Yyk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=pu91sAP8jfF9HvUPcESN6l3gN7Vk/CEsVXX/oDh+4HX4d8dMhKU699KpObgSMKxiDXQKplKaXQwtptS3X3I8PoHfBUupDggfv0+0kmhFxUFgdED2mD8EngHlldf2wSgJ9N9p+EFCZr1AWPSdPS9BzHaAwQ0uDjwWIn2MHdfR6CI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AFl0pOO1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AFl0pOO1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29D77C116D0; Tue, 24 Feb 2026 16:36:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950973; bh=nu9vMrdiR6vyLXbjcbF7j1PN1pBFhVJBtbJrOkS8Yyk=; h=Date:From:To:Cc:Subject:References:From; b=AFl0pOO1GEz8XtLIOcWTyY5D3OKz4BQnZxdN9pqqsSdX0dCflI5rcPQ7YVSyQx0Oh YiiNtiZaZWnn7+Zy99Mab2La5644jQYQPm90xvPLmTzzkOIukBOIgJ+PayRz4G7iRj q+Rvxe50sqf09egUVJsgbYRDXO8wQ1jvo5g3GpVk21IaFY+djnRbQrZudYtwH9IUJ7 gZIBiunYFm6sClawVrSxSFqAYEzyOnCb5fxAy+KQ6ms6H2844cNJ24lJolhV2h316l BXPxnJs/Kpv77qpNSCa2versGQMo5doHh90vwpVLD7/u1KzQIIi3ze++30DRhE0pIw k7H25yRh5DRWQ== Date: Tue, 24 Feb 2026 17:36:10 +0100 Message-ID: <20260224163429.542178086@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 12/48] tick/sched: Avoid hrtimer_cancel/start() sequence References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The sequence of cancel and start is inefficient. It has to do the timer lock/unlock twice and in the worst case has to reprogram the underlying clock event device twice. The reason why it is done this way is the usage of hrtimer_forward_now(), which requires the timer to be inactive. But that can be completely avoided as the forward can be done on a variable and does not need any of the overrun accounting provided by hrtimer_forward_now(). Implement a trivial forwarding mechanism and replace the cancel/reprogram sequence with hrtimer_start(..., new_expiry). For the non high resolution case the timer is not actually armed, but used for storage so that code checking for expiry times can unconditially look it up in the timer. So it is safe for that case to set the new expiry time directly. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/tick-sched.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -864,19 +864,32 @@ u64 get_cpu_iowait_time_us(int cpu, u64 } EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us); =20 +/* Simplified variant of hrtimer_forward_now() */ +static ktime_t tick_forward_now(ktime_t expires, ktime_t now) +{ + ktime_t delta =3D now - expires; + + if (likely(delta < TICK_NSEC)) + return expires + TICK_NSEC; + + expires +=3D TICK_NSEC * ktime_divns(delta, TICK_NSEC); + if (expires > now) + return expires; + return expires + TICK_NSEC; +} + static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) { - hrtimer_cancel(&ts->sched_timer); - hrtimer_set_expires(&ts->sched_timer, ts->last_tick); + ktime_t expires =3D ts->last_tick; =20 - /* Forward the time to expire in the future */ - hrtimer_forward(&ts->sched_timer, now, TICK_NSEC); + if (now >=3D expires) + expires =3D tick_forward_now(expires, now); =20 if (tick_sched_flag_test(ts, TS_FLAG_HIGHRES)) { - hrtimer_start_expires(&ts->sched_timer, - HRTIMER_MODE_ABS_PINNED_HARD); + hrtimer_start(&ts->sched_timer, expires, HRTIMER_MODE_ABS_PINNED_HARD); } else { - tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1); + hrtimer_set_expires(&ts->sched_timer, expires); + tick_program_event(expires, 1); } =20 /* From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7BA63A7F64 for ; Tue, 24 Feb 2026 16:36:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950978; cv=none; b=SHfOp4rSvQYS7285EFyUu2UpYzkAlgu8SCCK2iw3kVQfEldo1W5XAh1eBig9OZrAEpU7VIk2n32oekSGleWFjn7dpKBUP557oJ9FSDXmF4ZX90nGW0PbF0qOVutcDH6r6Qqn3y7EgeUt+fQz6CoA3kn3CqJrTjBZwJEyTLLgL/c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950978; c=relaxed/simple; bh=EXbifuHpIDd/8o4MFyR5Z/UiiXn9TTSCoZIjZ4QYDyk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ft/SL711uEDPuNucRe7Fcv/gDawzx3SuewoLIa3aS6ocQUBfVaV7E5DaDwHyFx/1aJnPp3B5YNv484uJ0iQraVVTVjk9B0L6Mdhymm9BJ/z2XZbmteOkaV6aMz8elPGOxxaZqjm4wX0VvtBsQOis7w5A6yDOLtctLA0beRf8X8k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pKBIO3Jd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pKBIO3Jd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFEF3C19423; Tue, 24 Feb 2026 16:36:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950978; bh=EXbifuHpIDd/8o4MFyR5Z/UiiXn9TTSCoZIjZ4QYDyk=; h=Date:From:To:Cc:Subject:References:From; b=pKBIO3JdT9Oo2uFlWIkpEyaGJyeI/OwKUsF4dMtnQJ1LOYwITqSkV6SL8N+k/sf/J o8xayYoovAP6F+6Il7RKr72Udl2fGm0ZK2i38IYKEVlTJc0llz2h3TLKD+pZjRCljo g1eQ+DmvnkVotToNuyB7X+FU2QhaVd75rTbbkj1GYt2BibGp64PQykcOIx+yFRXs+7 e7E9pe48+ebSlmKtUM4RQ4H+jDhR/8UZIG0vG5D1HCom7SH/9YwpsfV7jNPZh1yiib 9CN90rIB0uHBbNa3saNFxim6eQ+sUojt4a2AMMGVRdNFZiCjPWNryMr6h3qfkJBR2W TsovxMCrMeTEQ== Date: Tue, 24 Feb 2026 17:36:15 +0100 Message-ID: <20260224163429.609049777@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 13/48] clockevents: Remove redundant CLOCK_EVT_FEAT_KTIME References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The only real usecase for this is the hrtimer based broadcast device. No point in using two different feature flags for this. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/clockchips.h | 1 - kernel/time/clockevents.c | 4 ++-- kernel/time/tick-broadcast-hrtimer.c | 1 - 3 files changed, 2 insertions(+), 4 deletions(-) --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -45,7 +45,6 @@ enum clock_event_state { */ # define CLOCK_EVT_FEAT_PERIODIC 0x000001 # define CLOCK_EVT_FEAT_ONESHOT 0x000002 -# define CLOCK_EVT_FEAT_KTIME 0x000004 =20 /* * x86(64) specific (mis)features: --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -319,8 +319,8 @@ int clockevents_program_event(struct clo WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n", clockevent_get_state(dev)); =20 - /* Shortcut for clockevent devices that can deal with ktime. */ - if (dev->features & CLOCK_EVT_FEAT_KTIME) + /* ktime_t based reprogramming for the broadcast hrtimer device */ + if (unlikely(dev->features & CLOCK_EVT_FEAT_HRTIMER)) return dev->set_next_ktime(expires, dev); =20 delta =3D ktime_to_ns(ktime_sub(expires, ktime_get())); --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -78,7 +78,6 @@ static struct clock_event_device ce_broa .set_state_shutdown =3D bc_shutdown, .set_next_ktime =3D bc_set_next, .features =3D CLOCK_EVT_FEAT_ONESHOT | - CLOCK_EVT_FEAT_KTIME | CLOCK_EVT_FEAT_HRTIMER, .rating =3D 0, .bound_on =3D -1, From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FF523A7840 for ; Tue, 24 Feb 2026 16:36:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950983; cv=none; b=WSC3ruiNKQdNko7iIyLrLxncsXfDUzlSFl/v8fkKM8xU9FmNzLODTpuQCxvQe3qcNz/wsAk11UqI5hCkGKb/OPLi94bk1KzUDqTOX53L7Q6b43+TiUp3z6TBRhUWjcsq64so7cKM9FOelofwo2Jf3fRDOuNRN3zlMrLBfE4LG/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950983; c=relaxed/simple; bh=XWBXrKeLbBg6AgsSOTEYlmPS2L9IozPEpf9xXdHYVYw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gO3R8SPMLntKBpQU6gLWpvhtIjXGDNQ/sThaoT8OH7t6CQD0wuzcBgg5hpsSMRClfUwFJgceNU0CQvAPxU6UM+oiEgA2XX5xD5pHZStJlnd4LyyjxK6r1uq4AMoafTtc6N7dow9oQLRChjbgMEMi17eOC/bVX16TpVhs/Oku/aI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QqF2yGqd; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QqF2yGqd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AD165C19425; Tue, 24 Feb 2026 16:36:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950983; bh=XWBXrKeLbBg6AgsSOTEYlmPS2L9IozPEpf9xXdHYVYw=; h=Date:From:To:Cc:Subject:References:From; b=QqF2yGqdLtu+1QvfxWVoOYUuB9jJDv7ztWuhu1M4FJNGLAn/nRglnXaZd+F+w1YC6 KreT8MmMW7gFQzniOfZhLdDkTiXlVdGKKGiw9L1Gg9ILh9Bdd5/cLVfbVRURbq16Oq jrpL1ul2ra1EPKmuRS5q1kem9oSsSP0m70UZNUsrhbd10o2S00H5N/wD00E1e/4nsa +EpQT12Gr/I2akUlSHm291U/IZwg3oQYvWCOr0eAP0ne5UmXwmMsDFJ4ZvHpqEaLaw 6TQJPUrM1sGu6WJbTKMe3/En3B8oZvNdp54lQbRKQAe9V01Jir7WJbGdGXBjRgaQNQ BrYYX7BaIZSZw== Date: Tue, 24 Feb 2026 17:36:20 +0100 Message-ID: <20260224163429.675151545@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 14/48] timekeeping: Allow inlining clocksource::read() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On some architectures clocksource::read() boils down to a single instruction, so the indirect function call is just a massive overhead especially with speculative execution mitigations in effect. Allow architectures to enable conditional inlining of that read to avoid that by: - providing a static branch to switch to the inlined variant - disabling the branch before clocksource changes - enabling the branch after a clocksource change, when the clocksource indicates in a feature flag that it is the one which provides the inlined variant This is intentionally not a static call as that would only remove the indirect call, but not the rest of the overhead. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/clocksource.h | 2 + kernel/time/Kconfig | 3 + kernel/time/timekeeping.c | 74 ++++++++++++++++++++++++++++++++-------= ----- 3 files changed, 60 insertions(+), 19 deletions(-) --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -149,6 +149,8 @@ struct clocksource { #define CLOCK_SOURCE_SUSPEND_NONSTOP 0x80 #define CLOCK_SOURCE_RESELECT 0x100 #define CLOCK_SOURCE_VERIFY_PERCPU 0x200 +#define CLOCK_SOURCE_CAN_INLINE_READ 0x400 + /* simplify initialization of mask field */ #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0) =20 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -17,6 +17,9 @@ config ARCH_CLOCKSOURCE_DATA config ARCH_CLOCKSOURCE_INIT bool =20 +config ARCH_WANTS_CLOCKSOURCE_READ_INLINE + bool + # Timekeeping vsyscall support config GENERIC_TIME_VSYSCALL bool --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -3,34 +3,30 @@ * Kernel timekeeping code and accessor functions. Based on code from * timer.c, moved in commit 8524070b7982. */ -#include -#include -#include +#include +#include +#include +#include #include -#include -#include -#include +#include #include -#include -#include +#include +#include #include +#include +#include +#include #include -#include -#include +#include #include #include -#include -#include -#include -#include -#include -#include +#include =20 #include =20 #include "tick-internal.h" -#include "ntp_internal.h" #include "timekeeping_internal.h" +#include "ntp_internal.h" =20 #define TK_CLEAR_NTP (1 << 0) #define TK_CLOCK_WAS_SET (1 << 1) @@ -275,6 +271,11 @@ static inline void tk_update_sleep_time( tk->monotonic_to_boot =3D ktime_to_timespec64(tk->offs_boot); } =20 +#ifdef CONFIG_ARCH_WANTS_CLOCKSOURCE_READ_INLINE +#include + +static DEFINE_STATIC_KEY_FALSE(clocksource_read_inlined); + /* * tk_clock_read - atomic clocksource read() helper * @@ -288,13 +289,36 @@ static inline void tk_update_sleep_time( * a read of the fast-timekeeper tkrs (which is protected by its own locki= ng * and update logic). */ -static inline u64 tk_clock_read(const struct tk_read_base *tkr) +static __always_inline u64 tk_clock_read(const struct tk_read_base *tkr) { struct clocksource *clock =3D READ_ONCE(tkr->clock); =20 + if (static_branch_likely(&clocksource_read_inlined)) + return arch_inlined_clocksource_read(clock); + return clock->read(clock); } =20 +static inline void clocksource_disable_inline_read(void) +{ + static_branch_disable(&clocksource_read_inlined); +} + +static inline void clocksource_enable_inline_read(void) +{ + static_branch_enable(&clocksource_read_inlined); +} +#else +static __always_inline u64 tk_clock_read(const struct tk_read_base *tkr) +{ + struct clocksource *clock =3D READ_ONCE(tkr->clock); + + return clock->read(clock); +} +static inline void clocksource_disable_inline_read(void) { } +static inline void clocksource_enable_inline_read(void) { } +#endif + /** * tk_setup_internals - Set up internals to use clocksource clock. * @@ -375,7 +399,7 @@ static noinline u64 delta_to_ns_safe(con return mul_u64_u32_add_u64_shr(delta, tkr->mult, tkr->xtime_nsec, tkr->sh= ift); } =20 -static inline u64 timekeeping_cycles_to_ns(const struct tk_read_base *tkr,= u64 cycles) +static __always_inline u64 timekeeping_cycles_to_ns(const struct tk_read_b= ase *tkr, u64 cycles) { /* Calculate the delta since the last update_wall_time() */ u64 mask =3D tkr->mask, delta =3D (cycles - tkr->cycle_last) & mask; @@ -1631,7 +1655,19 @@ int timekeeping_notify(struct clocksourc =20 if (tk->tkr_mono.clock =3D=3D clock) return 0; + + /* Disable inlined reads accross the clocksource switch */ + clocksource_disable_inline_read(); + stop_machine(change_clocksource, clock, NULL); + + /* + * If the clocksource has been selected and supports inlined reads + * enable the branch. + */ + if (tk->tkr_mono.clock =3D=3D clock && clock->flags & CLOCK_SOURCE_CAN_IN= LINE_READ) + clocksource_enable_inline_read(); + tick_clock_notify(); return tk->tkr_mono.clock =3D=3D clock ? 0 : -1; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32A153A7F7C for ; Tue, 24 Feb 2026 16:36:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950988; cv=none; b=Z/yASw365VHT8eoR6KzvsC55tTiDMcsL8NkBPCvPfPdDsQA8lJIDiGUoGu3bXG7lPalvzYslsKAK1aX5IrSz1KyQASUzIiqLAJK/VFGT4tP9qImCKg2xnqzyaYZsQ7OPkETcsOYDhMRUMBIvmgQAGoPXNwFfq4iwSCgid2UBmEg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950988; c=relaxed/simple; bh=Dnjjqe21GBxvZ5X+pDOO1X6vhlAY3v8cSHNjce7QgVM=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=A6jUrc115DqunNjmt9UAzzDKAcSbLc+wCpUkEncRE1y4FD/BlR0x1q7EG999zvDJ7gmbpjOR6EJuCHVOpwMTC33NCBFxH1RsgxZKDSxcP2bCRHDzeKIm3hX7kAyGvUJ6TMmJwtMWVJ+vzB+tNNRIJ2ROTnmSBHI8sE4lqLHjdvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nh2ZL69K; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nh2ZL69K" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80865C116D0; Tue, 24 Feb 2026 16:36:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950988; bh=Dnjjqe21GBxvZ5X+pDOO1X6vhlAY3v8cSHNjce7QgVM=; h=Date:From:To:Cc:Subject:References:From; b=nh2ZL69K4D/Mq/cCjA2nlvoXRog9VRE9t8OaIwBt/EmLyxJP9Zz1w1L2IK49G8ek6 ywZ6iqNmDTxIiVUukneZjeXnwG8mW/of68IaiLm5/GH0U6uBTFeX0cC3xuEaCEdceK lppnPmMCaKSD316ZNoaRd73XntemV8jZW7TEm6l5tr3/BiRB3Wod6XgICtWGk/FybI vJECXYFYEQO25xbzRKD+/VD+AyMlKjX+eYILVh8q6H6/dVfULb4cIj4038ReWgzbWm C2enDkvRHxQweWXwneMoeLCg2lgjqc1YTr3Cnko/BnQ++yuyZz/dkKnfWKhAq8wwEv UzZTxygEmnLkA== Date: Tue, 24 Feb 2026 17:36:24 +0100 Message-ID: <20260224163429.741886362@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 15/48] x86: Inline TSC reads in timekeeping References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Avoid the overhead of the indirect call for a single instruction to read the TSC. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- arch/x86/Kconfig | 1 + arch/x86/include/asm/clock_inlined.h | 14 ++++++++++++++ arch/x86/kernel/tsc.c | 1 + 3 files changed, 16 insertions(+) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -141,6 +141,7 @@ config X86 select ARCH_USE_SYM_ANNOTATIONS select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH select ARCH_WANT_DEFAULT_BPF_JIT if X86_64 + select ARCH_WANTS_CLOCKSOURCE_READ_INLINE if X86_64 select ARCH_WANTS_DYNAMIC_TASK_STRUCT select ARCH_WANTS_NO_INSTR select ARCH_WANT_GENERAL_HUGETLB --- /dev/null +++ b/arch/x86/include/asm/clock_inlined.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CLOCK_INLINED_H +#define _ASM_X86_CLOCK_INLINED_H + +#include + +struct clocksource; + +static __always_inline u64 arch_inlined_clocksource_read(struct clocksourc= e *cs) +{ + return (u64)rdtsc_ordered(); +} + +#endif --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1201,6 +1201,7 @@ static struct clocksource clocksource_ts .mask =3D CLOCKSOURCE_MASK(64), .flags =3D CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VALID_FOR_HRES | + CLOCK_SOURCE_CAN_INLINE_READ | CLOCK_SOURCE_MUST_VERIFY | CLOCK_SOURCE_VERIFY_PERCPU, .id =3D CSID_X86_TSC, From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 334FA3A7F7C for ; Tue, 24 Feb 2026 16:36:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950993; cv=none; b=Wl5d2Gh/esLSfx6zox4o+riA6PJiMoWUkXm4uk+VcCwNGvAUVbk3pxJqDSRQqy1jcUHtT3DaoYwoRgdShefP9ztjNchMdkHy8l1l0J9tA/vA+9CA7R6ymcXIlH6nXKibW7D4wY2fVXMQ9Lc1El6Ej79wRW57hsUi8jnVFHPXci0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950993; c=relaxed/simple; bh=+KRsXSo8ov1yzjRQREo4WOccpNE4GYaKU4CCBoVqcaU=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=sQY13e8ANOMGmSOofx5aiNVxMO1v5CP519g5bXIQXUJeya3V12WpVDqWL8/hiBSpNUM3jJ9jJYL335hPMnQnnhFztvr28bLyAULCy7PUOUo8PEAuMb9x2zfYDayyHjsMwIMw3ZZFBTEUA1nABkThHs5iHSm+uYX77a/m0MWQHUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aKZuEOVr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aKZuEOVr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82146C116D0; Tue, 24 Feb 2026 16:36:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950993; bh=+KRsXSo8ov1yzjRQREo4WOccpNE4GYaKU4CCBoVqcaU=; h=Date:From:To:Cc:Subject:References:From; b=aKZuEOVr8/EDuzKTjR4ztOrkG83+M1i9hnBMZLOlxCmYHYyOIH1Rfvta2QXOkXb52 P3FdFQ69pvVG5S6s6ucp6Mg8J3LOOj3HXJ3v+va9TleXqVaSa+TdB/U+Jp7Dfo4i+D XhE6srILbr0/v2j7mNPrQIdl2lI58o+B6FfmVtdSNR8FOPHy20RNWt60TH4J5VUlxM w+Uer2YMrhAncp3fZVCycT3lgfameFpAJdcdvIg/pkxcfYgTjj84Il3fuXWr2CkVFK V0kIBCsar5fSSo+FIs+v3mB8jTcEXgW5Sf6BIGXBe7lko4w8vL8oeGxzdBvq+aBe1S PZXulfSqxshJg== Date: Tue, 24 Feb 2026 17:36:29 +0100 Message-ID: <20260224163429.809059527@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 16/48] x86/apic: Remove pointless fence in lapic_next_deadline() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" lapic_next_deadline() contains a fence before the TSC read and the write to the TSC_DEADLINE MSR with a content free and therefore useless comment: /* This MSR is special and need a special fence: */ The MSR is not really special. It is just not a serializing MSR, but that does not matter at all in this context as all of these operations are strictly CPU local. The only thing the fence prevents is that the RDTSC is speculated ahead, but that's not really relevant as the delta is calculated way before based on a previous TSC read and therefore inaccurate by definition. So removing the fence is just making it slightly more inaccurate in the worst case, but that is irrelevant as it's way below the actual system immanent latencies and variations. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- arch/x86/kernel/apic/apic.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -412,22 +412,20 @@ EXPORT_SYMBOL_GPL(setup_APIC_eilvt); /* * Program the next event, relative to now */ -static int lapic_next_event(unsigned long delta, - struct clock_event_device *evt) +static int lapic_next_event(unsigned long delta, struct clock_event_device= *evt) { apic_write(APIC_TMICT, delta); return 0; } =20 -static int lapic_next_deadline(unsigned long delta, - struct clock_event_device *evt) +static int lapic_next_deadline(unsigned long delta, struct clock_event_dev= ice *evt) { - u64 tsc; + /* + * There is no weak_wrmsr_fence() required here as all of this is purely + * CPU local. Avoid the [ml]fence overhead. + */ + u64 tsc =3D rdtsc(); =20 - /* This MSR is special and need a special fence: */ - weak_wrmsr_fence(); - - tsc =3D rdtsc(); wrmsrq(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AF863A7F77 for ; Tue, 24 Feb 2026 16:36:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950998; cv=none; b=laQ0HsWnkKUAE+MscabA6gkwP/c0Zkwmdr6fyFcX68ThYXeouVHAPZsD568gFmklikR8JVlY1coKuOuD6NLI9JS34AIN/VAK36haWN8BpQJ+7X3bnXkxZYYf3F5DxDaz/tyXUmBZxTfn54/EE84PC3KvzZflKDSk46l2W36/V8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771950998; c=relaxed/simple; bh=KMAUtK4LlktSqiMOuizW9zq5JIPy3BF7H8cr4poWS2s=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Ij8YXniE7Tg8x755X5vbK/BS3E3fwCulgvCoCqpYv2SlaMQKJkhsgQOPt3Nsx2yv/xAnmhMf6duBAFNH7Lq07jSmWfM0UTdDrIXJ78f5HOFe06RXlOFj+guJqZhCODUr6UcRZkuFhJroHTOk+M2FRYWO8aJCWqx+01+EnbF/dxA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KO9tfLgf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KO9tfLgf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FE49C19422; Tue, 24 Feb 2026 16:36:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771950998; bh=KMAUtK4LlktSqiMOuizW9zq5JIPy3BF7H8cr4poWS2s=; h=Date:From:To:Cc:Subject:References:From; b=KO9tfLgfIRi7e7gwNb4Dmhp9jS5tI61LpOumrgZYa+8PqKHX51FI8tMJV/4a+vRry oXg7sRLKblyYsmWS8tA2RyMtrxcHrYsGZLB1MRevbRRoTISQbDoqEtuirmpUmIHolJ BNxSaz2gJk2xzRejM0klHDbxNrNFhBfQs3YLgniPOoESaMmJRmO/SKSfnhTUUuZcyD g7aYJV3KZq1GVekpq2PDJNnIDUUK1TmO6vzmlAdXecghz8DlQMH1HDB0DKrtVfuHGQ x5SqGbdVhdWrPEJDm1rStwwF1vovkPQ4Uv10rbZPvT99M85cneHYaQH9EuZYmKi7uV +VrSH6KR9VyPw== Date: Tue, 24 Feb 2026 17:36:34 +0100 Message-ID: <20260224163429.877429827@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 17/48] x86/apic: Avoid the PVOPS indirection for the TSC deadline timer References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" XEN PV does not emulate the TSC deadline timer, so the PVOPS indirection for writing the deadline MSR can be avoided completely. Use native_wrmsrq() instead. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- arch/x86/kernel/apic/apic.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -426,7 +426,7 @@ static int lapic_next_deadline(unsigned */ u64 tsc =3D rdtsc(); =20 - wrmsrq(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); + native_wrmsrq(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; } =20 @@ -450,7 +450,7 @@ static int lapic_timer_shutdown(struct c * the timer _and_ zero the counter registers: */ if (v & APIC_LVT_TIMER_TSCDEADLINE) - wrmsrq(MSR_IA32_TSC_DEADLINE, 0); + native_wrmsrq(MSR_IA32_TSC_DEADLINE, 0); else apic_write(APIC_TMICT, 0); =20 @@ -547,6 +547,11 @@ static __init bool apic_validate_deadlin =20 if (!boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) return false; + + /* XEN_PV does not support it, but be paranoia about it */ + if (boot_cpu_has(X86_FEATURE_XENPV)) + goto clear; + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) return true; =20 @@ -559,9 +564,11 @@ static __init bool apic_validate_deadlin if (boot_cpu_data.microcode >=3D rev) return true; =20 - setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; " "please update microcode to version: 0x%x (or later)\n", rev); + +clear: + setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); return false; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 936D33A63EB for ; Tue, 24 Feb 2026 16:36:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951003; cv=none; b=bqD71504E+0C9rPO/aKiJr4JByPQ7LIkUtzhVmUHg7glusDxq7f/bBk0KW5sNoqdQnEEDWQGJ5mj+DTETHGt6kqbGsaXxVslQgvNsvV2Y+EoFvz2TOjJBl9jV1urc09LTgdj9ilpTBHzHzk8XrTXiNbhBNIzk7TPLIAwRRk1cVM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951003; c=relaxed/simple; bh=kNINeP5ljjuNHJTHfRjrgrsBfkEf98j5jk496403GS0=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=a/zAHCqhHfRSP8H2R9niIkcvIcI1x9bggnlvnGBocm1J653BMwS5IdmSqD69gIsQqUkN72QwAbyZy5e63p5FVhL1ZCn/1+q6ARMqgi8pK6+uvKPtdhG/jo9WzCYuQxu1qv3RQAtwW2/QXdeU+q569tiw/VmuvT1NeTsg+SK27sw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cKSQRMqu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cKSQRMqu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A3196C116D0; Tue, 24 Feb 2026 16:36:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951003; bh=kNINeP5ljjuNHJTHfRjrgrsBfkEf98j5jk496403GS0=; h=Date:From:To:Cc:Subject:References:From; b=cKSQRMquyuqvNq/EzSbbV9vsSIT+VAFwvjoIsxa3rpNivF/FfL7qojlpjo1cbIqTJ WfP9RAZ2kISUXS1TC8Ta5Z3jkyL/59iYCL4CrTaQ9jo42FUT6zCGa+zZ5oar3mko/4 FICvYbU5U3KIdUoxhneVL4+Et2QpMvK2Nht6lsg7cZlu729ZkW6n7L8bPwQHL/16zQ eebhriQvOQu8zot/A8vR+/0EbA/KWAM6OOZIrQEvDHYG8kafK8DGGQqna2ot3dgY68 mru4V3NgAnJofS4LZWunJa3BFHSf8dfrFl6jCxQCdVsJPcFte0viVcBGYEN4D/bLAb cB4zlLngdfSuQ== Date: Tue, 24 Feb 2026 17:36:40 +0100 Message-ID: <20260224163429.944763521@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 18/48] timekeeping: Provide infrastructure for coupled clockevents References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some architectures have clockevent devices which are coupled to the system clocksource by implementing a less than or equal comparator which compares the programmed absolute expiry time against the underlying time counter. Well known examples are TSC/TSC deadline timer and the S390 TOD clocksource/comparator. While the concept is nice it has some downsides: 1) The clockevents core code is strictly based on relative expiry times as that's the most common case for clockevent device hardware. That requires to convert the absolute expiry time provided by the caller (hrtimers, NOHZ code) to a relative expiry time by reading and substracting the current time. The clockevent::set_next_event() callback must then read the counter again to convert the relative expiry back into a absolute one. 2) The conversion factors from nanoseconds to counter clock cycles are set up when the clockevent is registered. When NTP applies corrections then the clockevent conversion factors can deviate from the clocksource conversion substantially which either results in timers firing late or in the worst case early. The early expiry then needs to do a reprogam with a short delta. In most cases this is papered over by the fact that the read in the set_next_event() callback happens after the read which is used to calculate the delta. So the tendency is that timers expire mostly late. All of this can be avoided by providing support for these devices in the core code: 1) The timekeeping core keeps track of the last update to the clocksource by storing the base nanoseconds and the corresponding clocksource counter value. That's used to keep the conversion math for reading the time within 64-bit in the common case. This information can be used to avoid both reads of the underlying clocksource in the clockevents reprogramming path: delta =3D expiry - base_ns; cycles =3D base_cycles + ((delta * clockevent::mult) >> clockevent::sh= ift); The resulting cycles value can be directly used to program the comparator. 2) As #1 does not longer provide the "compensation" through the second read the deviation of the clocksource and clockevent conversions caused by NTP become more prominent. This can be cured by letting the timekeeping core compute and store the reverse conversion factors when the clocksource cycles to nanoseconds factors are modified by NTP: CS::MULT (1 << NS_TO_CYC_SHIFT) --------------- =3D ---------------------- (1 << CS:SHIFT) NS_TO_CYC_MULT =09 Ergo: NS_TO_CYC_MULT =3D (1 << (CS::SHIFT + NS_TO_CYC_SHIFT)) / CS::MU= LT =20 The NS_TO_CYC_SHIFT value is calculated when the clocksource is installed so that it aims for a one hour maximum sleep time. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/clocksource.h | 1=20 include/linux/timekeeper_internal.h | 8 ++ kernel/time/Kconfig | 3=20 kernel/time/timekeeping.c | 110 +++++++++++++++++++++++++++++++= +++++ kernel/time/timekeeping.h | 2=20 5 files changed, 124 insertions(+) --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -150,6 +150,7 @@ struct clocksource { #define CLOCK_SOURCE_RESELECT 0x100 #define CLOCK_SOURCE_VERIFY_PERCPU 0x200 #define CLOCK_SOURCE_CAN_INLINE_READ 0x400 +#define CLOCK_SOURCE_HAS_COUPLED_CLOCK_EVENT 0x800 =20 /* simplify initialization of mask field */ #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0) --- a/include/linux/timekeeper_internal.h +++ b/include/linux/timekeeper_internal.h @@ -72,6 +72,10 @@ struct tk_read_base { * @id: The timekeeper ID * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds + * @cs_id: The ID of the current clocksource + * @cs_ns_to_cyc_mult: Multiplicator for nanoseconds to cycles conversion + * @cs_ns_to_cyc_shift: Shift value for nanoseconds to cycles conversion + * @cs_ns_to_cyc_maxns: Maximum nanoseconds to cyles conversion range * @clock_was_set_seq: The sequence number of clock was set events * @cs_was_changed_seq: The sequence number of clocksource change events * @clock_valid: Indicator for valid clock @@ -159,6 +163,10 @@ struct timekeeper { u64 raw_sec; =20 /* Cachline 3 and 4 (timekeeping internal variables): */ + enum clocksource_ids cs_id; + u32 cs_ns_to_cyc_mult; + u32 cs_ns_to_cyc_shift; + u64 cs_ns_to_cyc_maxns; unsigned int clock_was_set_seq; u8 cs_was_changed_seq; u8 clock_valid; --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -47,6 +47,9 @@ config GENERIC_CLOCKEVENTS_BROADCAST_IDL config GENERIC_CLOCKEVENTS_MIN_ADJUST bool =20 +config GENERIC_CLOCKEVENTS_COUPLED + bool + # Generic update of CMOS clock config GENERIC_CMOS_UPDATE bool --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -391,6 +391,20 @@ static void tk_setup_internals(struct ti tk->tkr_raw.mult =3D clock->mult; tk->ntp_err_mult =3D 0; tk->skip_second_overflow =3D 0; + + tk->cs_id =3D clock->id; + + /* Coupled clockevent data */ + if (IS_ENABLED(CONFIG_GENERIC_CLOCKEVENTS_COUPLED) && + clock->flags & CLOCK_SOURCE_HAS_COUPLED_CLOCK_EVENT) { + /* + * Aim for an one hour maximum delta and use KHz to handle + * clocksources with a frequency above 4GHz correctly as + * the frequency argument of clocks_calc_mult_shift() is u32. + */ + clocks_calc_mult_shift(&tk->cs_ns_to_cyc_mult, &tk->cs_ns_to_cyc_shift, + NSEC_PER_MSEC, clock->freq_khz, 3600 * 1000); + } } =20 /* Timekeeper helper functions. */ @@ -720,6 +734,36 @@ static inline void tk_update_ktime_data( tk->tkr_raw.base =3D ns_to_ktime(tk->raw_sec * NSEC_PER_SEC); } =20 +static inline void tk_update_ns_to_cyc(struct timekeeper *tks, struct time= keeper *tkc) +{ + struct tk_read_base *tkrs =3D &tks->tkr_mono; + struct tk_read_base *tkrc =3D &tkc->tkr_mono; + unsigned int shift; + + if (!IS_ENABLED(CONFIG_GENERIC_CLOCKEVENTS_COUPLED) || + !(tkrs->clock->flags & CLOCK_SOURCE_HAS_COUPLED_CLOCK_EVENT)) + return; + + if (tkrs->mult =3D=3D tkrc->mult && tkrs->shift =3D=3D tkrc->shift) + return; + /* + * The conversion math is simple: + * + * CS::MULT (1 << NS_TO_CYC_SHIFT) + * --------------- =3D ---------------------- + * (1 << CS:SHIFT) NS_TO_CYC_MULT + * + * Ergo: + * + * NS_TO_CYC_MULT =3D (1 << (CS::SHIFT + NS_TO_CYC_SHIFT)) / CS::MULT + * + * NS_TO_CYC_SHIFT has been set up in tk_setup_internals() + */ + shift =3D tkrs->shift + tks->cs_ns_to_cyc_shift; + tks->cs_ns_to_cyc_mult =3D (u32)div_u64(1ULL << shift, tkrs->mult); + tks->cs_ns_to_cyc_maxns =3D div_u64(tkrs->clock->mask, tks->cs_ns_to_cyc_= mult); +} + /* * Restore the shadow timekeeper from the real timekeeper. */ @@ -754,6 +798,7 @@ static void timekeeping_update_from_shad tk->tkr_mono.base_real =3D tk->tkr_mono.base + tk->offs_real; =20 if (tk->id =3D=3D TIMEKEEPER_CORE) { + tk_update_ns_to_cyc(tk, &tkd->timekeeper); update_vsyscall(tk); update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET); =20 @@ -808,6 +853,71 @@ static void timekeeping_forward_now(stru tk_update_coarse_nsecs(tk); } =20 +/* + * ktime_expiry_to_cycles - Convert a expiry time to clocksource cycles + * @id: Clocksource ID which is required for validity + * @expires_ns: Absolute CLOCK_MONOTONIC expiry time (nsecs) to be convert= ed + * @cycles: Pointer to storage for corresponding absolute cycles value + * + * Convert a CLOCK_MONOTONIC based absolute expiry time to a cycles value + * based on the correlated clocksource of the clockevent device by using + * the base nanoseconds and cycles values of the last timekeeper update and + * converting the delta between @expires_ns and base nanoseconds to cycles. + * + * This only works for clockevent devices which are using a less than or + * equal comparator against the clocksource. + * + * Utilizing this avoids two clocksource reads for such devices, the + * ktime_get() in clockevents_program_event() to calculate the delta expiry + * value and the readout in the device::set_next_event() callback to + * convert the delta back to a absolute comparator value. + * + * Returns: True if @id matches the current clocksource ID, false otherwise + */ +bool ktime_expiry_to_cycles(enum clocksource_ids id, ktime_t expires_ns, u= 64 *cycles) +{ + struct timekeeper *tk =3D &tk_core.timekeeper; + struct tk_read_base *tkrm =3D &tk->tkr_mono; + ktime_t base_ns, delta_ns, max_ns; + u64 base_cycles, delta_cycles; + unsigned int seq; + u32 mult, shift; + + /* + * Racy check to avoid the seqcount overhead when ID does not match. If + * the relevant clocksource is installed concurrently, then this will + * just delay the switch over to this mechanism until the next event is + * programmed. If the ID is not matching the clock events code will use + * the regular relative set_next_event() callback as before. + */ + if (data_race(tk->cs_id) !=3D id) + return false; + + do { + seq =3D read_seqcount_begin(&tk_core.seq); + + if (tk->cs_id !=3D id) + return false; + + base_cycles =3D tkrm->cycle_last; + base_ns =3D tkrm->base + (tkrm->xtime_nsec >> tkrm->shift); + + mult =3D tk->cs_ns_to_cyc_mult; + shift =3D tk->cs_ns_to_cyc_shift; + max_ns =3D tk->cs_ns_to_cyc_maxns; + + } while (read_seqcount_retry(&tk_core.seq, seq)); + + /* Prevent negative deltas and multiplication overflows */ + delta_ns =3D min(expires_ns - base_ns, max_ns); + delta_ns =3D max(delta_ns, 0); + + /* Convert to cycles */ + delta_cycles =3D ((u64)delta_ns * mult) >> shift; + *cycles =3D base_cycles + delta_cycles; + return true; +} + /** * ktime_get_real_ts64 - Returns the time of day in a timespec64. * @ts: pointer to the timespec to be set --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -9,6 +9,8 @@ extern ktime_t ktime_get_update_offsets_ ktime_t *offs_boot, ktime_t *offs_tai); =20 +bool ktime_expiry_to_cycles(enum clocksource_ids id, ktime_t expires_ns, u= 64 *cycles); + extern int timekeeping_valid_for_hres(void); extern u64 timekeeping_max_deferment(void); extern void timekeeping_warp_clock(void); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C78A3A9015 for ; Tue, 24 Feb 2026 16:36:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951008; cv=none; b=j6QiqaXlSv2wcV3xFXg5rDJiEl0oUurMKIM8JiTpeJ40oHVIbXxTr7+Bh/fkOx0Szg35edstMhdbQK/0rmkhGzxIJYdkKYQeBR9TYhP2KGuCiB/1B5AjIc6WxT0Ii0qaSIwL4dMZ4HlCvWNOXMvPtKIyi2ICxEJ0N6KudCjs7hY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951008; c=relaxed/simple; bh=/jAQQnb5W7A3O1STDZbNJNWP+kqGgMwYzPAAGAjyce8=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=W08zJQ+oRILz+0Zl/YB3O9fiBkNE8p9r7EMl7Rj3qHtGlnfT0aFmnNOpRmHEiJ+Xu/YdRbr+UEk2TEPtzbpjvY8KfB+j1B01fQCnyrDMF0KTRYEcJxjp0IPwH8wuXSVKuaqLqvObdFGp36qgvwxiyDrSU/mnA6pyJ+NrSuaiyto= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jf+Z0f4j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jf+Z0f4j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 761D9C19422; Tue, 24 Feb 2026 16:36:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951007; bh=/jAQQnb5W7A3O1STDZbNJNWP+kqGgMwYzPAAGAjyce8=; h=Date:From:To:Cc:Subject:References:From; b=jf+Z0f4jlXonNX7CXeY+5rOa9+Jz1yq7ghcXgXtpC/cMVqSLWvlgHCaujRwn6M0O3 PDlZ/NB0M3KKBAuTcUj1qc1lFixkSyIYgeV/W5yOArBaaTnvgCu0kdNgEUfHix19vj 5xvfAvOgFJfFuBvv5kM7WbjzFWTymbgpDJvAWrS1tSeWf+bIb24Zgm52HmRxEhoqor w1lbggRAd3iJx1fHNeo/aPRShbeKr6v+wmJihMQ2H/as8cqH8ant74Wxqd4PSFytnu wChk24SWZ5Jl+BOmwrF9OlhBiK2sYhWgDXbXDHmThDGgy38t7IOynnBzwRTjQNQYD7 tq6yUQd3tdOSA== Date: Tue, 24 Feb 2026 17:36:45 +0100 Message-ID: <20260224163430.010425428@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 19/48] clockevents: Provide support for clocksource coupled comparators References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some clockevent devices are coupled to the system clocksource by implementing a less than or equal comparator which compares the programmed absolute expiry time against the underlying time counter. The timekeeping core provides a function to convert and absolute CLOCK_MONOTONIC based expiry time to a absolute clock cycles time which can be directly fed into the comparator. That spares two time reads in the next event progamming path, one to convert the absolute nanoseconds time to a delta value and the other to convert the delta value back to a absolute time value suitable for the comparator. Provide a new clocksource callback which takes the absolute cycle value and wire it up in clockevents_program_event(). Similar to clocksources allow architectures to inline the rearm operation. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/clockchips.h | 7 +++++-- kernel/time/Kconfig | 4 ++++ kernel/time/clockevents.c | 44 +++++++++++++++++++++++++++++++++++++++-= ---- 3 files changed, 48 insertions(+), 7 deletions(-) --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -43,8 +43,9 @@ enum clock_event_state { /* * Clock event features */ -# define CLOCK_EVT_FEAT_PERIODIC 0x000001 -# define CLOCK_EVT_FEAT_ONESHOT 0x000002 +# define CLOCK_EVT_FEAT_PERIODIC 0x000001 +# define CLOCK_EVT_FEAT_ONESHOT 0x000002 +# define CLOCK_EVT_FEAT_CLOCKSOURCE_COUPLED 0x000004 =20 /* * x86(64) specific (mis)features: @@ -100,6 +101,7 @@ struct clock_event_device { void (*event_handler)(struct clock_event_device *); int (*set_next_event)(unsigned long evt, struct clock_event_device *); int (*set_next_ktime)(ktime_t expires, struct clock_event_device *); + void (*set_next_coupled)(u64 cycles, struct clock_event_device *); ktime_t next_event; u64 max_delta_ns; u64 min_delta_ns; @@ -107,6 +109,7 @@ struct clock_event_device { u32 shift; enum clock_event_state state_use_accessors; unsigned int features; + enum clocksource_ids cs_id; unsigned long retries; =20 int (*set_state_periodic)(struct clock_event_device *); --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -50,6 +50,10 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST config GENERIC_CLOCKEVENTS_COUPLED bool =20 +config GENERIC_CLOCKEVENTS_COUPLED_INLINE + select GENERIC_CLOCKEVENTS_COUPLED + bool + # Generic update of CMOS clock config GENERIC_CMOS_UPDATE bool --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -292,6 +292,38 @@ static int clockevents_program_min_delta =20 #endif /* CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST */ =20 +#ifdef CONFIG_GENERIC_CLOCKEVENTS_COUPLED +#ifdef CONFIG_GENERIC_CLOCKEVENTS_COUPLED_INLINE +#include +#else +static __always_inline void +arch_inlined_clockevent_set_next_coupled(u64 u64 cycles, struct clock_even= t_device *dev) { } +#endif + +static inline bool clockevent_set_next_coupled(struct clock_event_device *= dev, ktime_t expires) +{ + u64 cycles; + + if (unlikely(!(dev->features & CLOCK_EVT_FEAT_CLOCKSOURCE_COUPLED))) + return false; + + if (unlikely(!ktime_expiry_to_cycles(dev->cs_id, expires, &cycles))) + return false; + + if (IS_ENABLED(CONFIG_GENERIC_CLOCKEVENTS_COUPLED_INLINE)) + arch_inlined_clockevent_set_next_coupled(cycles, dev); + else + dev->set_next_coupled(cycles, dev); + return true; +} + +#else +static inline bool clockevent_set_next_coupled(struct clock_event_device *= dev, ktime_t expires) +{ + return false; +} +#endif + /** * clockevents_program_event - Reprogram the clock event device. * @dev: device to program @@ -300,11 +332,10 @@ static int clockevents_program_min_delta * * Returns 0 on success, -ETIME when the event is in the past. */ -int clockevents_program_event(struct clock_event_device *dev, ktime_t expi= res, - bool force) +int clockevents_program_event(struct clock_event_device *dev, ktime_t expi= res, bool force) { - unsigned long long clc; int64_t delta; + u64 cycles; int rc; =20 if (WARN_ON_ONCE(expires < 0)) @@ -323,6 +354,9 @@ int clockevents_program_event(struct clo if (unlikely(dev->features & CLOCK_EVT_FEAT_HRTIMER)) return dev->set_next_ktime(expires, dev); =20 + if (likely(clockevent_set_next_coupled(dev, expires))) + return 0; + delta =3D ktime_to_ns(ktime_sub(expires, ktime_get())); if (delta <=3D 0) return force ? clockevents_program_min_delta(dev) : -ETIME; @@ -330,8 +364,8 @@ int clockevents_program_event(struct clo delta =3D min(delta, (int64_t) dev->max_delta_ns); delta =3D max(delta, (int64_t) dev->min_delta_ns); =20 - clc =3D ((unsigned long long) delta * dev->mult) >> dev->shift; - rc =3D dev->set_next_event((unsigned long) clc, dev); + cycles =3D ((u64)delta * dev->mult) >> dev->shift; + rc =3D dev->set_next_event((unsigned long) cycles, dev); =20 return (rc && force) ? clockevents_program_min_delta(dev) : rc; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A2393A8FE1 for ; Tue, 24 Feb 2026 16:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951013; cv=none; b=UutGoy5rqZG2hDaE7aTzJ/6QNJVJH7OC8IpxxyjDSJo8ylxOPMNIIQ3LTF9cHaa29H1JKTgTu2Jr/RaxyWjWDqRuP9mMD4vFR9f3HLLRtfZ1D5/w46l0KPe9QEHJHuA2uYh9QlETDyDgZVMhGSHXEbS1U+wmqEKvphQpPaL3pb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951013; c=relaxed/simple; bh=T0PgGw45XKbEq/105Qxsyx83s7PyOv0Wxw6JUQ6IIp0=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=oQ1cHnqD5lcVDT+ozL5yKhyyidLmD44v4foj1SjVfucCfkHq1wbq764fwrJ88DSkVcEtmJi98lReJcLIiGf2DqEpzbOAI0QBrIliWNhvkZTzfpTxAt0iQhoPtP+tIkPRp8AI5Zbdwit0G8O6Lc9hecKM89OHylv4wPQC/vahPzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RlG5jjMR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RlG5jjMR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 463B6C19425; Tue, 24 Feb 2026 16:36:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951012; bh=T0PgGw45XKbEq/105Qxsyx83s7PyOv0Wxw6JUQ6IIp0=; h=Date:From:To:Cc:Subject:References:From; b=RlG5jjMRoO0mNQq5HQuGmUvHAaWzOMa40vhBt7kZQylP7hW28LiAzqYhdEZTRK256 kvY9o1lPoWytZTyjXpheub0OnvAJkxq5aEEodNO/Ggl0WpolP0CaNEIPRF4UCn/rtn MZSDybeTMP5VSJuUFk9K+aeoG8lmhXSiRY1SUYitMFKRTU5gueTNDIR1eXIT8dwVq4 qo8kAOkieUv1F7oFc2EkwLqLTPRwcfyCB2VtFGVW+muN+DZCe3sicJL+Vi8n772MUO MQ1ht+7dK9KdSgjeEMyRiOikJaODXE5CmmdGnby0riqPKo6TMJwvFZn0d1U9XoyFfR IQjEozCImAIsg== Date: Tue, 24 Feb 2026 17:36:49 +0100 Message-ID: <20260224163430.076565985@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 20/48] x86/apic: Enable TSC coupled programming mode References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The TSC deadline timer is directly coupled to the TSC and setting the next deadline is tedious as the clockevents core code converts the CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading the current time from the TSC. It converts that delta to cycles and hands the result to lapic_next_deadline(), which then has read to the TSC and add the delta to program the timer. The core code now supports coupled clock event devices and can provide the expiry time in TSC cycles directly without reading the TSC at all. This obviouly works only when the TSC is the current clocksource, but that's the default for all modern CPUs which implement the TSC deadline timer. If the TSC is not the current clocksource (e.g. early boot) then the core code falls back to the relative set_next_event() callback as before. Signed-off-by: Thomas Gleixner Cc: x86@kernel.org Acked-by: Peter Zijlstra (Intel) Tested-by: Nathan Chancellor --- arch/x86/Kconfig | 1 + arch/x86/include/asm/clock_inlined.h | 8 ++++++++ arch/x86/kernel/apic/apic.c | 12 ++++++------ arch/x86/kernel/tsc.c | 3 ++- 4 files changed, 17 insertions(+), 7 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -164,6 +164,7 @@ config X86 select EDAC_SUPPORT select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_AP= IC) select GENERIC_CLOCKEVENTS_BROADCAST_IDLE if GENERIC_CLOCKEVENTS_BROADCAST + select GENERIC_CLOCKEVENTS_COUPLED_INLINE if X86_64 select GENERIC_CLOCKEVENTS_MIN_ADJUST select GENERIC_CMOS_UPDATE select GENERIC_CPU_AUTOPROBE --- a/arch/x86/include/asm/clock_inlined.h +++ b/arch/x86/include/asm/clock_inlined.h @@ -11,4 +11,12 @@ static __always_inline u64 arch_inlined_ return (u64)rdtsc_ordered(); } =20 +struct clock_event_device; + +static __always_inline void +arch_inlined_clockevent_set_next_coupled(u64 cycles, struct clock_event_de= vice *evt) +{ + native_wrmsrq(MSR_IA32_TSC_DEADLINE, cycles); +} + #endif --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -591,14 +591,14 @@ static void setup_APIC_timer(void) =20 if (this_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER)) { levt->name =3D "lapic-deadline"; - levt->features &=3D ~(CLOCK_EVT_FEAT_PERIODIC | - CLOCK_EVT_FEAT_DUMMY); + levt->features &=3D ~(CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_DUMMY); + levt->features |=3D CLOCK_EVT_FEAT_CLOCKSOURCE_COUPLED; + levt->cs_id =3D CSID_X86_TSC; levt->set_next_event =3D lapic_next_deadline; - clockevents_config_and_register(levt, - tsc_khz * (1000 / TSC_DIVISOR), - 0xF, ~0UL); - } else + clockevents_config_and_register(levt, tsc_khz * (1000 / TSC_DIVISOR), 0x= F, ~0UL); + } else { clockevents_register_device(levt); + } =20 apic_update_vector(smp_processor_id(), LOCAL_TIMER_VECTOR, true); } --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1203,7 +1203,8 @@ static struct clocksource clocksource_ts CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_CAN_INLINE_READ | CLOCK_SOURCE_MUST_VERIFY | - CLOCK_SOURCE_VERIFY_PERCPU, + CLOCK_SOURCE_VERIFY_PERCPU | + CLOCK_SOURCE_HAS_COUPLED_CLOCK_EVENT, .id =3D CSID_X86_TSC, .vdso_clock_mode =3D VDSO_CLOCKMODE_TSC, .enable =3D tsc_cs_enable, From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D86333A8FE1 for ; Tue, 24 Feb 2026 16:36:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951017; cv=none; b=n/729k+8URhoGYj1s14vDw+kgW9OzYugqxe1flNUd3JiIluZmOsYNjiVoTcu6VUJemDeXeuViUVcMWe55eTRY5pjhTyd+BZi0vsvzapx57e3JbgblBO/iARslKgKHgbzX3t30YmNLj5IZ0i5zrgPbQ03J8rJdJ6Qjd/NHarKP8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951017; c=relaxed/simple; bh=KkoadQnbGFDGtRcIeCRYsYWiK7tvE9ULkPF+Bqym150=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=rVSbhR/CowL77SPpwycrUQq/idE1MZtBV1EAtS/CTWlaph72hrRoi+nHjZCmuPE5E9Epc3Aqi1BdnSJKDmIvjvwcH9qcgUP/cdbTzNDPXKpJTMvsCrAUjMLNLbQsZwRhAxbDSmlY4eA53no0Q/RHNpPRtVAP3vpjZJr5tfu+LLs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GEDn4vKR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GEDn4vKR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30947C116D0; Tue, 24 Feb 2026 16:36:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951017; bh=KkoadQnbGFDGtRcIeCRYsYWiK7tvE9ULkPF+Bqym150=; h=Date:From:To:Cc:Subject:References:From; b=GEDn4vKRJHVFadPyzoSj+X9rvauD7czsQhJda3i8fXNZDqvgHjBBlYKnjlpLkDz+Y xe+5OqxGtMVKNK60tbgKQxNRA7uD/ky92e3iEZeUDDurp4cmUuaUDJTOnT4tU+nGJr 6b//PD/7TEXOBPHuZvwKGfj8ULLk8V/Bo1qNPK9KLxV2/qbARTa2cfHRLQCHj46S5Z 27PnPJDXjYTB0hiEvH7/wpb483zVf1jVcusXkpIFlU2MqsEGfUQvJdeWPHgtfYtiKI c5ZfOjYUnzhph7MNXc/nTvkk42enSTDul0tlmXLoFnbhzWZEqK5KX1TNMLXBOR0/Wh N2nTZYu4npn7g== Date: Tue, 24 Feb 2026 17:36:54 +0100 Message-ID: <20260224163430.143098153@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 21/48] hrtimer: Add debug object init assertion References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The debug object coverage in hrtimer_start_range_ns() happens too late to do anything useful. Implement the init assert assertion part and invoke that early in hrtimer_start_range_ns(). Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -441,12 +441,37 @@ static bool hrtimer_fixup_free(void *add } } =20 +/* Stub timer callback for improperly used timers. */ +static enum hrtimer_restart stub_timer(struct hrtimer *unused) +{ + WARN_ON_ONCE(1); + return HRTIMER_NORESTART; +} + +/* + * hrtimer_fixup_assert_init is called when: + * - an untracked/uninit-ed object is found + */ +static bool hrtimer_fixup_assert_init(void *addr, enum debug_obj_state sta= te) +{ + struct hrtimer *timer =3D addr; + + switch (state) { + case ODEBUG_STATE_NOTAVAILABLE: + hrtimer_setup(timer, stub_timer, CLOCK_MONOTONIC, 0); + return true; + default: + return false; + } +} + static const struct debug_obj_descr hrtimer_debug_descr =3D { - .name =3D "hrtimer", - .debug_hint =3D hrtimer_debug_hint, - .fixup_init =3D hrtimer_fixup_init, - .fixup_activate =3D hrtimer_fixup_activate, - .fixup_free =3D hrtimer_fixup_free, + .name =3D "hrtimer", + .debug_hint =3D hrtimer_debug_hint, + .fixup_init =3D hrtimer_fixup_init, + .fixup_activate =3D hrtimer_fixup_activate, + .fixup_free =3D hrtimer_fixup_free, + .fixup_assert_init =3D hrtimer_fixup_assert_init, }; =20 static inline void debug_hrtimer_init(struct hrtimer *timer) @@ -470,6 +495,11 @@ static inline void debug_hrtimer_deactiv debug_object_deactivate(timer, &hrtimer_debug_descr); } =20 +static inline void debug_hrtimer_assert_init(struct hrtimer *timer) +{ + debug_object_assert_init(timer, &hrtimer_debug_descr); +} + void destroy_hrtimer_on_stack(struct hrtimer *timer) { debug_object_free(timer, &hrtimer_debug_descr); @@ -483,6 +513,7 @@ static inline void debug_hrtimer_init_on static inline void debug_hrtimer_activate(struct hrtimer *timer, enum hrtimer_mode mode) { } static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { } +static inline void debug_hrtimer_assert_init(struct hrtimer *timer) { } #endif =20 static inline void debug_setup(struct hrtimer *timer, clockid_t clockid, e= num hrtimer_mode mode) @@ -1359,6 +1390,8 @@ void hrtimer_start_range_ns(struct hrtim struct hrtimer_clock_base *base; unsigned long flags; =20 + debug_hrtimer_assert_init(timer); + /* * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft * match on CONFIG_PREEMPT_RT =3D n. With PREEMPT_RT check the hard From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D523E3A7F73 for ; Tue, 24 Feb 2026 16:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951022; cv=none; b=p582XQDTN9O622I9O0LEHYUFBq3S4XE15274dVfBccZmmKmUIHYEgrM4AJEfjj9YF1f/98j+sB1ofz96vLwNFzS4/DbJVGWEX7XofhIeuSoRjE2EUC08Nem/EVpZZAydW1voAEzk1iEmnA+U75EKOw0PkMNumFR9ZfmGHb/Jrsk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951022; c=relaxed/simple; bh=l7yYYJ1obXVo5mbeRZpoeiPjacxQkQPafHh3ujK/bbw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZpgzGDTKz/THBvZq9s4mCvcM+t9R6ZdL1dt7Wp16A9a6UEZL2hAu8S/Cslk5skeJ/pHakcKlpKe7ctRyxo5KS//BIMe+hQfwjTwc7UrQvFGv71+5Xyxwrorpf00JzlMLg+OdTbzywt3N7gsjtdLgUVtxIVwWmRHA1IQmlbjblzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H5qRWRqM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H5qRWRqM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 039D2C116D0; Tue, 24 Feb 2026 16:37:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951022; bh=l7yYYJ1obXVo5mbeRZpoeiPjacxQkQPafHh3ujK/bbw=; h=Date:From:To:Cc:Subject:References:From; b=H5qRWRqMW/awv5DhVbDCLa5EGcoeswy9DRpeQksa+uCPzgpk4ccyJ5ujuOkP7h2ol lDotkzuX7Y8aZIKmZ8rw+0ZBOKdPi1bIMmygeh665Hw3YV1IrM5jR4hppm/qRCY9Pb FFIQSf1Ji43gh+IYkA0bQ1z7YbWi1OQkBxuA9+k/3jy/gnQpxvuQssvnk8dMB/PUcl oY9SpsXcYNH5+fRXUQCZUmcRkjfIDteQX7k8cXAz3A+0USdVXJkVD7hC5E2gQqKcYA uMkpVmf/5j5WOBbvo2N0DFAA76CDYafaXlPiuwRAfuAKBn38NtaO/WVZkABLQ9Dr7K +jxo+9WlIXvow== Date: Tue, 24 Feb 2026 17:36:59 +0100 Message-ID: <20260224163430.208491877@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 22/48] hrtimer: Reduce trace noise in hrtimer_start() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" hrtimer_start() when invoked with an already armed timer traces like: =20 -.. [032] d.h2. 5.002263: hrtimer_cancel: hrtimer=3D .... -.. [032] d.h1. 5.002263: hrtimer_start: hrtimer=3D .... =20 Which is incorrect as the timer doesn't get canceled. Just the expiry time changes. The internal dequeue operation which is required for that is not really interesting for trace analysis. But it makes it tedious to keep real cancellations and the above case apart. Remove the cancel tracing in hrtimer_start() and add a 'was_armed' indicator to the hrtimer start tracepoint, which clearly indicates what the state of the hrtimer is when hrtimer_start() is invoked: -.. [032] d.h1. 6.200103: hrtimer_start: hrtimer=3D .... was_armed= =3D0 -.. [032] d.h1. 6.200558: hrtimer_start: hrtimer=3D .... was_armed= =3D1 =20 Fixes: c6a2a1770245 ("hrtimer: Add tracepoint for hrtimers") Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/trace/events/timer.h | 11 +++++++---- kernel/time/hrtimer.c | 43 ++++++++++++++++++++------------------= ----- 2 files changed, 27 insertions(+), 27 deletions(-) --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -218,12 +218,13 @@ TRACE_EVENT(hrtimer_setup, * hrtimer_start - called when the hrtimer is started * @hrtimer: pointer to struct hrtimer * @mode: the hrtimers mode + * @was_armed: Was armed when hrtimer_start*() was invoked */ TRACE_EVENT(hrtimer_start, =20 - TP_PROTO(struct hrtimer *hrtimer, enum hrtimer_mode mode), + TP_PROTO(struct hrtimer *hrtimer, enum hrtimer_mode mode, bool was_armed), =20 - TP_ARGS(hrtimer, mode), + TP_ARGS(hrtimer, mode, was_armed), =20 TP_STRUCT__entry( __field( void *, hrtimer ) @@ -231,6 +232,7 @@ TRACE_EVENT(hrtimer_start, __field( s64, expires ) __field( s64, softexpires ) __field( enum hrtimer_mode, mode ) + __field( bool, was_armed ) ), =20 TP_fast_assign( @@ -239,13 +241,14 @@ TRACE_EVENT(hrtimer_start, __entry->expires =3D hrtimer_get_expires(hrtimer); __entry->softexpires =3D hrtimer_get_softexpires(hrtimer); __entry->mode =3D mode; + __entry->was_armed =3D was_armed; ), =20 TP_printk("hrtimer=3D%p function=3D%ps expires=3D%llu softexpires=3D%llu " - "mode=3D%s", __entry->hrtimer, __entry->function, + "mode=3D%s was_armed=3D%d", __entry->hrtimer, __entry->function, (unsigned long long) __entry->expires, (unsigned long long) __entry->softexpires, - decode_hrtimer_mode(__entry->mode)) + decode_hrtimer_mode(__entry->mode), __entry->was_armed) ); =20 /** --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -529,17 +529,10 @@ static inline void debug_setup_on_stack( trace_hrtimer_setup(timer, clockid, mode); } =20 -static inline void debug_activate(struct hrtimer *timer, - enum hrtimer_mode mode) +static inline void debug_activate(struct hrtimer *timer, enum hrtimer_mode= mode, bool was_armed) { debug_hrtimer_activate(timer, mode); - trace_hrtimer_start(timer, mode); -} - -static inline void debug_deactivate(struct hrtimer *timer) -{ - debug_hrtimer_deactivate(timer); - trace_hrtimer_cancel(timer); + trace_hrtimer_start(timer, mode, was_armed); } =20 static struct hrtimer_clock_base * @@ -1137,9 +1130,9 @@ EXPORT_SYMBOL_GPL(hrtimer_forward); * Returns true when the new timer is the leftmost timer in the tree. */ static bool enqueue_hrtimer(struct hrtimer *timer, struct hrtimer_clock_ba= se *base, - enum hrtimer_mode mode) + enum hrtimer_mode mode, bool was_armed) { - debug_activate(timer, mode); + debug_activate(timer, mode, was_armed); WARN_ON_ONCE(!base->cpu_base->online); =20 base->cpu_base->active_bases |=3D 1 << base->index; @@ -1199,6 +1192,8 @@ remove_hrtimer(struct hrtimer *timer, st if (state & HRTIMER_STATE_ENQUEUED) { bool reprogram; =20 + debug_hrtimer_deactivate(timer); + /* * Remove the timer and force reprogramming when high * resolution mode is active and the timer is on the current @@ -1207,7 +1202,6 @@ remove_hrtimer(struct hrtimer *timer, st * reprogramming happens in the interrupt handler. This is a * rare case and less expensive than a smp call. */ - debug_deactivate(timer); reprogram =3D base->cpu_base =3D=3D this_cpu_ptr(&hrtimer_bases); =20 /* @@ -1274,15 +1268,15 @@ static int __hrtimer_start_range_ns(stru { struct hrtimer_cpu_base *this_cpu_base =3D this_cpu_ptr(&hrtimer_bases); struct hrtimer_clock_base *new_base; - bool force_local, first; + bool force_local, first, was_armed; =20 /* * If the timer is on the local cpu base and is the first expiring * timer then this might end up reprogramming the hardware twice - * (on removal and on enqueue). To avoid that by prevent the - * reprogram on removal, keep the timer local to the current CPU - * and enforce reprogramming after it is queued no matter whether - * it is the new first expiring timer again or not. + * (on removal and on enqueue). To avoid that prevent the reprogram + * on removal, keep the timer local to the current CPU and enforce + * reprogramming after it is queued no matter whether it is the new + * first expiring timer again or not. */ force_local =3D base->cpu_base =3D=3D this_cpu_base; force_local &=3D base->cpu_base->next_timer =3D=3D timer; @@ -1304,7 +1298,7 @@ static int __hrtimer_start_range_ns(stru * avoids programming the underlying clock event twice (once at * removal and once after enqueue). */ - remove_hrtimer(timer, base, true, force_local); + was_armed =3D remove_hrtimer(timer, base, true, force_local); =20 if (mode & HRTIMER_MODE_REL) tim =3D ktime_add_safe(tim, __hrtimer_cb_get_time(base->clockid)); @@ -1321,7 +1315,7 @@ static int __hrtimer_start_range_ns(stru new_base =3D base; } =20 - first =3D enqueue_hrtimer(timer, new_base, mode); + first =3D enqueue_hrtimer(timer, new_base, mode, was_armed); =20 /* * If the hrtimer interrupt is running, then it will reevaluate the @@ -1439,8 +1433,11 @@ int hrtimer_try_to_cancel(struct hrtimer =20 base =3D lock_hrtimer_base(timer, &flags); =20 - if (!hrtimer_callback_running(timer)) + if (!hrtimer_callback_running(timer)) { ret =3D remove_hrtimer(timer, base, false, false); + if (ret) + trace_hrtimer_cancel(timer); + } =20 unlock_hrtimer_base(timer, &flags); =20 @@ -1877,7 +1874,7 @@ static void __run_hrtimer(struct hrtimer */ if (restart !=3D HRTIMER_NORESTART && !(timer->state & HRTIMER_STATE_ENQUEUED)) - enqueue_hrtimer(timer, base, HRTIMER_MODE_ABS); + enqueue_hrtimer(timer, base, HRTIMER_MODE_ABS, false); =20 /* * Separate the ->running assignment from the ->state assignment. @@ -2356,7 +2353,7 @@ static void migrate_hrtimer_list(struct while ((node =3D timerqueue_getnext(&old_base->active))) { timer =3D container_of(node, struct hrtimer, node); BUG_ON(hrtimer_callback_running(timer)); - debug_deactivate(timer); + debug_hrtimer_deactivate(timer); =20 /* * Mark it as ENQUEUED not INACTIVE otherwise the @@ -2373,7 +2370,7 @@ static void migrate_hrtimer_list(struct * sort out already expired timers and reprogram the * event device. */ - enqueue_hrtimer(timer, new_base, HRTIMER_MODE_ABS); + enqueue_hrtimer(timer, new_base, HRTIMER_MODE_ABS, true); } } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B78C3A63EB for ; Tue, 24 Feb 2026 16:37:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951027; cv=none; b=JBvb4L/49qZWSuj8sOJE11s4CRoUBnYwIF22ntkEwtuMdif5kk2YWSjdCejIvwYu9cG1aD1LtJPzAFSdvLhLMcr/jYzBBwtw/79BJEnb0JiydpS2Fzi8fmqTXa4KaIicIaEnZszU8F2By4zRW9W3rUaJ7D4lL+r2ZuWacbbc6ls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951027; c=relaxed/simple; bh=TziM9q0N7YYDBX5sc27kUykYVPLf7kEjYe5jWDOaQ9A=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=X2wMQ8624TL9MNJTBY+6zIPSw35JNPQTosCNv4U+NoB3J+OVR7DNkakpiM/Pc+D6yL5D8thAszEMsZqT0iE6+tKiFu+I04MoXmdUFrOy1x09VXoSaQzH+CFoccGSJ5FdQrjcVpiXsoINgOxuKriUYqZteBE7R5601caaTEt+8RA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Wc3HhJ1E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Wc3HhJ1E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D6198C19422; Tue, 24 Feb 2026 16:37:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951027; bh=TziM9q0N7YYDBX5sc27kUykYVPLf7kEjYe5jWDOaQ9A=; h=Date:From:To:Cc:Subject:References:From; b=Wc3HhJ1E1t7WifNMksPtbd8QenzP0iXvxN8wuQF3vslaBRCBplcu5j9Gy+lXSRD1N c1A45I8f/O85o3pI7MFBzg7lCtlHgmPH2EMy9eSIoLRjYRLXK2FLdy4CVL6C8flBOX 42UrRUdpxPo1kvsG1SgFskdcO5ZfpIf5dD+UIv3cRBY/QEwS31DLo86eDKbsNgSfrZ jRDcpzFGyIka9TUKn0JSp+Y+QaJyQ0Dk2uDpQzl8DWCk2QYZunMFznydafjIbvb7i0 fp+qefQwEOgjoUNJbYSd/I55Pfql0Zb4PANKgZCJhlD1Nf3n0uasyCcVcj1QVoHxu1 SkxkME9tmksWA== Date: Tue, 24 Feb 2026 17:37:04 +0100 Message-ID: <20260224163430.275551488@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 23/48] hrtimer: Use guards where appropriate References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify and tidy up the code where possible. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 48 +++++++++++++++------------------------------= --- 1 file changed, 15 insertions(+), 33 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -838,13 +838,12 @@ static void retrigger_next_event(void *a * In periodic low resolution mode, the next softirq expiration * must also be updated. */ - raw_spin_lock(&base->lock); + guard(raw_spinlock)(&base->lock); hrtimer_update_base(base); if (hrtimer_hres_active(base)) hrtimer_force_reprogram(base, 0); else hrtimer_update_next_event(base); - raw_spin_unlock(&base->lock); } =20 /* @@ -994,7 +993,6 @@ static bool update_needs_ipi(struct hrti void clock_was_set(unsigned int bases) { cpumask_var_t mask; - int cpu; =20 if (!hrtimer_highres_enabled() && !tick_nohz_is_active()) goto out_timerfd; @@ -1005,24 +1003,19 @@ void clock_was_set(unsigned int bases) } =20 /* Avoid interrupting CPUs if possible */ - cpus_read_lock(); - for_each_online_cpu(cpu) { - struct hrtimer_cpu_base *cpu_base; - unsigned long flags; + scoped_guard(cpus_read_lock) { + int cpu; =20 - cpu_base =3D &per_cpu(hrtimer_bases, cpu); - raw_spin_lock_irqsave(&cpu_base->lock, flags); + for_each_online_cpu(cpu) { + struct hrtimer_cpu_base *cpu_base =3D &per_cpu(hrtimer_bases, cpu); =20 - if (update_needs_ipi(cpu_base, bases)) - cpumask_set_cpu(cpu, mask); - - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); + guard(raw_spinlock_irqsave)(&cpu_base->lock); + if (update_needs_ipi(cpu_base, bases)) + cpumask_set_cpu(cpu, mask); + } + scoped_guard(preempt) + smp_call_function_many(mask, retrigger_next_event, NULL, 1); } - - preempt_disable(); - smp_call_function_many(mask, retrigger_next_event, NULL, 1); - preempt_enable(); - cpus_read_unlock(); free_cpumask_var(mask); =20 out_timerfd: @@ -1600,15 +1593,11 @@ u64 hrtimer_get_next_event(void) { struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); u64 expires =3D KTIME_MAX; - unsigned long flags; - - raw_spin_lock_irqsave(&cpu_base->lock, flags); =20 + guard(raw_spinlock_irqsave)(&cpu_base->lock); if (!hrtimer_hres_active(cpu_base)) expires =3D __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_ALL); =20 - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); - return expires; } =20 @@ -1623,25 +1612,18 @@ u64 hrtimer_next_event_without(const str { struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); u64 expires =3D KTIME_MAX; - unsigned long flags; - - raw_spin_lock_irqsave(&cpu_base->lock, flags); =20 + guard(raw_spinlock_irqsave)(&cpu_base->lock); if (hrtimer_hres_active(cpu_base)) { unsigned int active; =20 if (!cpu_base->softirq_activated) { active =3D cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; - expires =3D __hrtimer_next_event_base(cpu_base, exclude, - active, KTIME_MAX); + expires =3D __hrtimer_next_event_base(cpu_base, exclude, active, KTIME_= MAX); } active =3D cpu_base->active_bases & HRTIMER_ACTIVE_HARD; - expires =3D __hrtimer_next_event_base(cpu_base, exclude, active, - expires); + expires =3D __hrtimer_next_event_base(cpu_base, exclude, active, expires= ); } - - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); - return expires; } #endif From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84D7F3A9616 for ; Tue, 24 Feb 2026 16:37:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951032; cv=none; b=gTQgoYLTKSxebrZUEv44Izbvyld6p/7EhOBgCWUftgFirmkeGHBcad4UQsrWzApaRggzcCNRiZYhbgClSd66LsMULbkjkCvU2mneJjNMvkYTcxKz4zf3ctszC5/U4L0tlGVasuHjF8taaq/ZKyNQy+HcWRY1wKwOI4z6T7Q5nHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951032; c=relaxed/simple; bh=OPPVTOzSYq5k4wr/Z5xJlhp/AFBcgfIXVqZw28du0cc=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=M/p9ASBEwfFkJ3d7g7ya221IGhGZCHuP+wVPVy1vkkF5f1OAgImZYX9OgXPTL6wC9Q8sQZLybaxaQ9EWSd90FVjJDD9WD1wgU6gCKTaP+xaPrcqLaB2LVMvjhwmVY3/sF9SITUSXwgmAhagJPSyTbfhEQOIc/v6fGbUhKvpYeTM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QbBkQq9c; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QbBkQq9c" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85CD6C19423; Tue, 24 Feb 2026 16:37:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951032; bh=OPPVTOzSYq5k4wr/Z5xJlhp/AFBcgfIXVqZw28du0cc=; h=Date:From:To:Cc:Subject:References:From; b=QbBkQq9cJo9EPAVCrUpkxHXyTQ9cpVx722QykS1yyJJYASngkbWnh9rOuDDBuQC9J n35EPczzZSG7nrw7WT6Vz83nIEDAlIiIUlKbv95J1MNj8ysC8p71yf8kWnzFqvOIGW uxI5L+9ZTfbLJVzp9NSTbKZ7j9mQECJTA+F27xYV59JTN9OXYZDHV8OAJ/xjdnMLrR u88n6c0s02RMTjSk6uVOlWsUgt3JaBBqsfmAdL5Qm/D9NI6P0Vzk7H3aJvoaTnko1u 2d5gcSupHSNBNWVDJXYr3OV8g7oSn3ZhSznOXZw/qxz02wbPwk6epV9Nyhq+xDqTVr tcIMbSN/YsTNw== Date: Tue, 24 Feb 2026 17:37:09 +0100 Message-ID: <20260224163430.342740952@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 24/48] hrtimer: Cleanup coding style and comments References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As this code has some major surgery ahead, clean up coding style and bring comments up to date. No functional change intended. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 364 +++++++++++++++++++--------------------------= ----- 1 file changed, 143 insertions(+), 221 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -77,43 +77,22 @@ static ktime_t __hrtimer_cb_get_time(clo * to reach a base using a clockid, hrtimer_clockid_to_base() * is used to convert from clockid to the proper hrtimer_base_type. */ + +#define BASE_INIT(idx, cid) \ + [idx] =3D { .index =3D idx, .clockid =3D cid } + DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =3D { .lock =3D __RAW_SPIN_LOCK_UNLOCKED(hrtimer_bases.lock), - .clock_base =3D - { - { - .index =3D HRTIMER_BASE_MONOTONIC, - .clockid =3D CLOCK_MONOTONIC, - }, - { - .index =3D HRTIMER_BASE_REALTIME, - .clockid =3D CLOCK_REALTIME, - }, - { - .index =3D HRTIMER_BASE_BOOTTIME, - .clockid =3D CLOCK_BOOTTIME, - }, - { - .index =3D HRTIMER_BASE_TAI, - .clockid =3D CLOCK_TAI, - }, - { - .index =3D HRTIMER_BASE_MONOTONIC_SOFT, - .clockid =3D CLOCK_MONOTONIC, - }, - { - .index =3D HRTIMER_BASE_REALTIME_SOFT, - .clockid =3D CLOCK_REALTIME, - }, - { - .index =3D HRTIMER_BASE_BOOTTIME_SOFT, - .clockid =3D CLOCK_BOOTTIME, - }, - { - .index =3D HRTIMER_BASE_TAI_SOFT, - .clockid =3D CLOCK_TAI, - }, + .clock_base =3D { + BASE_INIT(HRTIMER_BASE_MONOTONIC, CLOCK_MONOTONIC), + BASE_INIT(HRTIMER_BASE_REALTIME, CLOCK_REALTIME), + BASE_INIT(HRTIMER_BASE_BOOTTIME, CLOCK_BOOTTIME), + BASE_INIT(HRTIMER_BASE_TAI, CLOCK_TAI), + BASE_INIT(HRTIMER_BASE_MONOTONIC_SOFT, CLOCK_MONOTONIC), + BASE_INIT(HRTIMER_BASE_REALTIME_SOFT, CLOCK_REALTIME), + BASE_INIT(HRTIMER_BASE_BOOTTIME_SOFT, CLOCK_BOOTTIME), + BASE_INIT(HRTIMER_BASE_TAI_SOFT, CLOCK_TAI), }, .csd =3D CSD_INIT(retrigger_next_event, NULL) }; @@ -150,18 +129,19 @@ static inline void hrtimer_schedule_hres * single place */ #ifdef CONFIG_SMP - /* * We require the migration_base for lock_hrtimer_base()/switch_hrtimer_ba= se() * such that hrtimer_callback_running() can unconditionally dereference * timer->base->cpu_base */ static struct hrtimer_cpu_base migration_cpu_base =3D { - .clock_base =3D { { - .cpu_base =3D &migration_cpu_base, - .seq =3D SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq, - &migration_cpu_base.lock), - }, }, + .clock_base =3D { + [0] =3D { + .cpu_base =3D &migration_cpu_base, + .seq =3D SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq, + &migration_cpu_base.lock), + }, + }, }; =20 #define migration_base migration_cpu_base.clock_base[0] @@ -178,15 +158,13 @@ static struct hrtimer_cpu_base migration * possible to set timer->base =3D &migration_base and drop the lock: the = timer * remains locked. */ -static -struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, - unsigned long *flags) +static struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *= timer, + unsigned long *flags) __acquires(&timer->base->lock) { - struct hrtimer_clock_base *base; - for (;;) { - base =3D READ_ONCE(timer->base); + struct hrtimer_clock_base *base =3D READ_ONCE(timer->base); + if (likely(base !=3D &migration_base)) { raw_spin_lock_irqsave(&base->cpu_base->lock, *flags); if (likely(base =3D=3D timer->base)) @@ -239,7 +217,7 @@ static bool hrtimer_suitable_target(stru return expires >=3D new_base->cpu_base->expires_next; } =20 -static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_= base *base, int pinned) +static inline struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_= base *base, bool pinned) { if (!hrtimer_base_is_online(base)) { int cpu =3D cpumask_any_and(cpu_online_mask, housekeeping_cpumask(HK_TYP= E_TIMER)); @@ -267,8 +245,7 @@ static inline struct hrtimer_cpu_base *g * the timer callback is currently running. */ static inline struct hrtimer_clock_base * -switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base, - int pinned) +switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base= , bool pinned) { struct hrtimer_cpu_base *new_cpu_base, *this_cpu_base; struct hrtimer_clock_base *new_base; @@ -281,13 +258,12 @@ switch_hrtimer_base(struct hrtimer *time =20 if (base !=3D new_base) { /* - * We are trying to move timer to new_base. - * However we can't change timer's base while it is running, - * so we keep it on the same CPU. No hassle vs. reprogramming - * the event source in the high resolution case. The softirq - * code will take care of this when the timer function has - * completed. There is no conflict as we hold the lock until - * the timer is enqueued. + * We are trying to move timer to new_base. However we can't + * change timer's base while it is running, so we keep it on + * the same CPU. No hassle vs. reprogramming the event source + * in the high resolution case. The remote CPU will take care + * of this when the timer function has completed. There is no + * conflict as we hold the lock until the timer is enqueued. */ if (unlikely(hrtimer_callback_running(timer))) return base; @@ -297,8 +273,7 @@ switch_hrtimer_base(struct hrtimer *time raw_spin_unlock(&base->cpu_base->lock); raw_spin_lock(&new_base->cpu_base->lock); =20 - if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, - this_cpu_base)) { + if (!hrtimer_suitable_target(timer, new_base, new_cpu_base, this_cpu_bas= e)) { raw_spin_unlock(&new_base->cpu_base->lock); raw_spin_lock(&base->cpu_base->lock); new_cpu_base =3D this_cpu_base; @@ -317,14 +292,13 @@ switch_hrtimer_base(struct hrtimer *time =20 #else /* CONFIG_SMP */ =20 -static inline struct hrtimer_clock_base * -lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) +static inline struct hrtimer_clock_base *lock_hrtimer_base(const struct hr= timer *timer, + unsigned long *flags) __acquires(&timer->base->cpu_base->lock) { struct hrtimer_clock_base *base =3D timer->base; =20 raw_spin_lock_irqsave(&base->cpu_base->lock, *flags); - return base; } =20 @@ -484,8 +458,7 @@ static inline void debug_hrtimer_init_on debug_object_init_on_stack(timer, &hrtimer_debug_descr); } =20 -static inline void debug_hrtimer_activate(struct hrtimer *timer, - enum hrtimer_mode mode) +static inline void debug_hrtimer_activate(struct hrtimer *timer, enum hrti= mer_mode mode) { debug_object_activate(timer, &hrtimer_debug_descr); } @@ -510,8 +483,7 @@ EXPORT_SYMBOL_GPL(destroy_hrtimer_on_sta =20 static inline void debug_hrtimer_init(struct hrtimer *timer) { } static inline void debug_hrtimer_init_on_stack(struct hrtimer *timer) { } -static inline void debug_hrtimer_activate(struct hrtimer *timer, - enum hrtimer_mode mode) { } +static inline void debug_hrtimer_activate(struct hrtimer *timer, enum hrti= mer_mode mode) { } static inline void debug_hrtimer_deactivate(struct hrtimer *timer) { } static inline void debug_hrtimer_assert_init(struct hrtimer *timer) { } #endif @@ -549,13 +521,12 @@ static struct hrtimer_clock_base * return &cpu_base->clock_base[idx]; } =20 -#define for_each_active_base(base, cpu_base, active) \ +#define for_each_active_base(base, cpu_base, active) \ while ((base =3D __next_base((cpu_base), &(active)))) =20 static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, const struct hrtimer *exclude, - unsigned int active, - ktime_t expires_next) + unsigned int active, ktime_t expires_next) { struct hrtimer_clock_base *base; ktime_t expires; @@ -618,29 +589,24 @@ static ktime_t __hrtimer_next_event_base * - HRTIMER_ACTIVE_SOFT, or * - HRTIMER_ACTIVE_HARD. */ -static ktime_t -__hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base, unsigned int a= ctive_mask) +static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base,= unsigned int active_mask) { - unsigned int active; struct hrtimer *next_timer =3D NULL; ktime_t expires_next =3D KTIME_MAX; + unsigned int active; =20 if (!cpu_base->softirq_activated && (active_mask & HRTIMER_ACTIVE_SOFT)) { active =3D cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; cpu_base->softirq_next_timer =3D NULL; - expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, - active, KTIME_MAX); - + expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, active, KTIME= _MAX); next_timer =3D cpu_base->softirq_next_timer; } =20 if (active_mask & HRTIMER_ACTIVE_HARD) { active =3D cpu_base->active_bases & HRTIMER_ACTIVE_HARD; cpu_base->next_timer =3D next_timer; - expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, active, - expires_next); + expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, active, expir= es_next); } - return expires_next; } =20 @@ -681,8 +647,8 @@ static inline ktime_t hrtimer_update_bas ktime_t *offs_boot =3D &base->clock_base[HRTIMER_BASE_BOOTTIME].offset; ktime_t *offs_tai =3D &base->clock_base[HRTIMER_BASE_TAI].offset; =20 - ktime_t now =3D ktime_get_update_offsets_now(&base->clock_was_set_seq, - offs_real, offs_boot, offs_tai); + ktime_t now =3D ktime_get_update_offsets_now(&base->clock_was_set_seq, of= fs_real, + offs_boot, offs_tai); =20 base->clock_base[HRTIMER_BASE_REALTIME_SOFT].offset =3D *offs_real; base->clock_base[HRTIMER_BASE_BOOTTIME_SOFT].offset =3D *offs_boot; @@ -702,8 +668,7 @@ static inline int hrtimer_hres_active(st cpu_base->hres_active : 0; } =20 -static void __hrtimer_reprogram(struct hrtimer_cpu_base *cpu_base, - struct hrtimer *next_timer, +static void __hrtimer_reprogram(struct hrtimer_cpu_base *cpu_base, struct = hrtimer *next_timer, ktime_t expires_next) { cpu_base->expires_next =3D expires_next; @@ -736,12 +701,9 @@ static void __hrtimer_reprogram(struct h * next event * Called with interrupts disabled and base->lock held */ -static void -hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) +static void hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, boo= l skip_equal) { - ktime_t expires_next; - - expires_next =3D hrtimer_update_next_event(cpu_base); + ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); =20 if (skip_equal && expires_next =3D=3D cpu_base->expires_next) return; @@ -752,41 +714,31 @@ hrtimer_force_reprogram(struct hrtimer_c /* High resolution timer related functions */ #ifdef CONFIG_HIGH_RES_TIMERS =20 -/* - * High resolution timer enabled ? - */ +/* High resolution timer enabled ? */ static bool hrtimer_hres_enabled __read_mostly =3D true; unsigned int hrtimer_resolution __read_mostly =3D LOW_RES_NSEC; EXPORT_SYMBOL_GPL(hrtimer_resolution); =20 -/* - * Enable / Disable high resolution mode - */ +/* Enable / Disable high resolution mode */ static int __init setup_hrtimer_hres(char *str) { return (kstrtobool(str, &hrtimer_hres_enabled) =3D=3D 0); } - __setup("highres=3D", setup_hrtimer_hres); =20 -/* - * hrtimer_high_res_enabled - query, if the highres mode is enabled - */ -static inline int hrtimer_is_hres_enabled(void) +/* hrtimer_high_res_enabled - query, if the highres mode is enabled */ +static inline bool hrtimer_is_hres_enabled(void) { return hrtimer_hres_enabled; } =20 -/* - * Switch to high resolution mode - */ +/* Switch to high resolution mode */ static void hrtimer_switch_to_hres(void) { struct hrtimer_cpu_base *base =3D this_cpu_ptr(&hrtimer_bases); =20 if (tick_init_highres()) { - pr_warn("Could not switch to high resolution mode on CPU %u\n", - base->cpu); + pr_warn("Could not switch to high resolution mode on CPU %u\n", base->cp= u); return; } base->hres_active =3D 1; @@ -800,10 +752,11 @@ static void hrtimer_switch_to_hres(void) =20 #else =20 -static inline int hrtimer_is_hres_enabled(void) { return 0; } +static inline bool hrtimer_is_hres_enabled(void) { return 0; } static inline void hrtimer_switch_to_hres(void) { } =20 #endif /* CONFIG_HIGH_RES_TIMERS */ + /* * Retrigger next event is called after clock was set with interrupts * disabled through an SMP function call or directly from low level @@ -841,7 +794,7 @@ static void retrigger_next_event(void *a guard(raw_spinlock)(&base->lock); hrtimer_update_base(base); if (hrtimer_hres_active(base)) - hrtimer_force_reprogram(base, 0); + hrtimer_force_reprogram(base, /* skip_equal */ false); else hrtimer_update_next_event(base); } @@ -887,8 +840,7 @@ static void hrtimer_reprogram(struct hrt timer_cpu_base->softirq_next_timer =3D timer; timer_cpu_base->softirq_expires_next =3D expires; =20 - if (!ktime_before(expires, timer_cpu_base->expires_next) || - !reprogram) + if (!ktime_before(expires, timer_cpu_base->expires_next) || !reprogram) return; } =20 @@ -914,8 +866,7 @@ static void hrtimer_reprogram(struct hrt __hrtimer_reprogram(cpu_base, timer, expires); } =20 -static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base, - unsigned int active) +static bool update_needs_ipi(struct hrtimer_cpu_base *cpu_base, unsigned i= nt active) { struct hrtimer_clock_base *base; unsigned int seq; @@ -1050,11 +1001,8 @@ void hrtimers_resume_local(void) retrigger_next_event(NULL); } =20 -/* - * Counterpart to lock_hrtimer_base above: - */ -static inline -void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) +/* Counterpart to lock_hrtimer_base above */ +static inline void unlock_hrtimer_base(const struct hrtimer *timer, unsign= ed long *flags) __releases(&timer->base->cpu_base->lock) { raw_spin_unlock_irqrestore(&timer->base->cpu_base->lock, *flags); @@ -1071,7 +1019,7 @@ void unlock_hrtimer_base(const struct hr * .. note:: * This only updates the timer expiry value and does not requeue the time= r. * - * There is also a variant of the function hrtimer_forward_now(). + * There is also a variant of this function: hrtimer_forward_now(). * * Context: Can be safely called from the callback function of @timer. If = called * from other contexts @timer must neither be enqueued nor runnin= g the @@ -1081,8 +1029,8 @@ void unlock_hrtimer_base(const struct hr */ u64 hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval) { - u64 orun =3D 1; ktime_t delta; + u64 orun =3D 1; =20 delta =3D ktime_sub(now, hrtimer_get_expires(timer)); =20 @@ -1118,13 +1066,15 @@ EXPORT_SYMBOL_GPL(hrtimer_forward); * enqueue_hrtimer - internal function to (re)start a timer * * The timer is inserted in expiry order. Insertion into the - * red black tree is O(log(n)). Must hold the base lock. + * red black tree is O(log(n)). * * Returns true when the new timer is the leftmost timer in the tree. */ static bool enqueue_hrtimer(struct hrtimer *timer, struct hrtimer_clock_ba= se *base, enum hrtimer_mode mode, bool was_armed) { + lockdep_assert_held(&base->cpu_base->lock); + debug_activate(timer, mode, was_armed); WARN_ON_ONCE(!base->cpu_base->online); =20 @@ -1139,20 +1089,19 @@ static bool enqueue_hrtimer(struct hrtim /* * __remove_hrtimer - internal function to remove a timer * - * Caller must hold the base lock. - * * High resolution timer mode reprograms the clock event device when the * timer is the one which expires next. The caller can disable this by set= ting * reprogram to zero. This is useful, when the context does a reprogramming * anyway (e.g. timer interrupt) */ -static void __remove_hrtimer(struct hrtimer *timer, - struct hrtimer_clock_base *base, - u8 newstate, int reprogram) +static void __remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_b= ase *base, + u8 newstate, bool reprogram) { struct hrtimer_cpu_base *cpu_base =3D base->cpu_base; u8 state =3D timer->state; =20 + lockdep_assert_held(&cpu_base->lock); + /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->state, newstate); if (!(state & HRTIMER_STATE_ENQUEUED)) @@ -1162,26 +1111,25 @@ static void __remove_hrtimer(struct hrti cpu_base->active_bases &=3D ~(1 << base->index); =20 /* - * Note: If reprogram is false we do not update - * cpu_base->next_timer. This happens when we remove the first - * timer on a remote cpu. No harm as we never dereference - * cpu_base->next_timer. So the worst thing what can happen is - * an superfluous call to hrtimer_force_reprogram() on the - * remote cpu later on if the same timer gets enqueued again. + * If reprogram is false don't update cpu_base->next_timer and do not + * touch the clock event device. + * + * This happens when removing the first timer on a remote CPU, which + * will be handled by the remote CPU's interrupt. It also happens when + * a local timer is removed to be immediately restarted. That's handled + * at the call site. */ if (reprogram && timer =3D=3D cpu_base->next_timer && !timer->is_lazy) - hrtimer_force_reprogram(cpu_base, 1); + hrtimer_force_reprogram(cpu_base, /* skip_equal */ true); } =20 -/* - * remove hrtimer, called with base lock held - */ -static inline int -remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, - bool restart, bool keep_local) +static inline bool remove_hrtimer(struct hrtimer *timer, struct hrtimer_cl= ock_base *base, + bool restart, bool keep_local) { u8 state =3D timer->state; =20 + lockdep_assert_held(&base->cpu_base->lock); + if (state & HRTIMER_STATE_ENQUEUED) { bool reprogram; =20 @@ -1209,9 +1157,9 @@ remove_hrtimer(struct hrtimer *timer, st reprogram &=3D !keep_local; =20 __remove_hrtimer(timer, base, state, reprogram); - return 1; + return true; } - return 0; + return false; } =20 static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t= tim, @@ -1230,34 +1178,27 @@ static inline ktime_t hrtimer_update_low return tim; } =20 -static void -hrtimer_update_softirq_timer(struct hrtimer_cpu_base *cpu_base, bool repro= gram) +static void hrtimer_update_softirq_timer(struct hrtimer_cpu_base *cpu_base= , bool reprogram) { - ktime_t expires; + ktime_t expires =3D __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_SOF= T); =20 /* - * Find the next SOFT expiration. - */ - expires =3D __hrtimer_get_next_event(cpu_base, HRTIMER_ACTIVE_SOFT); - - /* - * reprogramming needs to be triggered, even if the next soft - * hrtimer expires at the same time than the next hard + * Reprogramming needs to be triggered, even if the next soft + * hrtimer expires at the same time as the next hard * hrtimer. cpu_base->softirq_expires_next needs to be updated! */ if (expires =3D=3D KTIME_MAX) return; =20 /* - * cpu_base->*next_timer is recomputed by __hrtimer_get_next_event() - * cpu_base->*expires_next is only set by hrtimer_reprogram() + * cpu_base->next_timer is recomputed by __hrtimer_get_next_event() + * cpu_base->expires_next is only set by hrtimer_reprogram() */ hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram); } =20 -static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, - u64 delta_ns, const enum hrtimer_mode mode, - struct hrtimer_clock_base *base) +static bool __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, u= 64 delta_ns, + const enum hrtimer_mode mode, struct hrtimer_clock_base *base) { struct hrtimer_cpu_base *this_cpu_base =3D this_cpu_ptr(&hrtimer_bases); struct hrtimer_clock_base *new_base; @@ -1301,12 +1242,10 @@ static int __hrtimer_start_range_ns(stru hrtimer_set_expires_range_ns(timer, tim, delta_ns); =20 /* Switch the timer base, if necessary: */ - if (!force_local) { - new_base =3D switch_hrtimer_base(timer, base, - mode & HRTIMER_MODE_PINNED); - } else { + if (!force_local) + new_base =3D switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED= ); + else new_base =3D base; - } =20 first =3D enqueue_hrtimer(timer, new_base, mode, was_armed); =20 @@ -1319,9 +1258,12 @@ static int __hrtimer_start_range_ns(stru =20 if (!force_local) { /* - * If the current CPU base is online, then the timer is - * never queued on a remote CPU if it would be the first - * expiring timer there. + * If the current CPU base is online, then the timer is never + * queued on a remote CPU if it would be the first expiring + * timer there unless the timer callback is currently executed + * on the remote CPU. In the latter case the remote CPU will + * re-evaluate the first expiring timer after completing the + * callbacks. */ if (hrtimer_base_is_online(this_cpu_base)) return first; @@ -1336,7 +1278,7 @@ static int __hrtimer_start_range_ns(stru =20 smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd); } - return 0; + return false; } =20 /* @@ -1350,7 +1292,7 @@ static int __hrtimer_start_range_ns(stru */ if (timer->is_lazy) { if (new_base->cpu_base->expires_next <=3D hrtimer_get_expires(timer)) - return 0; + return false; } =20 /* @@ -1358,8 +1300,8 @@ static int __hrtimer_start_range_ns(stru * reprogramming on removal and enqueue. Force reprogram the * hardware by evaluating the new first expiring timer. */ - hrtimer_force_reprogram(new_base->cpu_base, 1); - return 0; + hrtimer_force_reprogram(new_base->cpu_base, /* skip_equal */ true); + return false; } =20 /** @@ -1371,8 +1313,8 @@ static int __hrtimer_start_range_ns(stru * relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED); * softirq based mode is considered for debug purpose only! */ -void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, - u64 delta_ns, const enum hrtimer_mode mode) +void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, u64 delta_= ns, + const enum hrtimer_mode mode) { struct hrtimer_clock_base *base; unsigned long flags; @@ -1464,8 +1406,7 @@ static void hrtimer_cpu_base_unlock_expi * the timer callback to finish. Drop expiry_lock and reacquire it. That * allows the waiter to acquire the lock and make progress. */ -static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base, - unsigned long flags) +static void hrtimer_sync_wait_running(struct hrtimer_cpu_base *cpu_base, u= nsigned long flags) { if (atomic_read(&cpu_base->timer_waiters)) { raw_spin_unlock_irqrestore(&cpu_base->lock, flags); @@ -1530,14 +1471,10 @@ void hrtimer_cancel_wait_running(const s spin_unlock_bh(&base->cpu_base->softirq_expiry_lock); } #else -static inline void -hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_base *base) { } -static inline void -hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *base) { } -static inline void -hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base *base) { } -static inline void hrtimer_sync_wait_running(struct hrtimer_cpu_base *base, - unsigned long flags) { } +static inline void hrtimer_cpu_base_init_expiry_lock(struct hrtimer_cpu_ba= se *base) { } +static inline void hrtimer_cpu_base_lock_expiry(struct hrtimer_cpu_base *b= ase) { } +static inline void hrtimer_cpu_base_unlock_expiry(struct hrtimer_cpu_base = *base) { } +static inline void hrtimer_sync_wait_running(struct hrtimer_cpu_base *base= , unsigned long fl) { } #endif =20 /** @@ -1668,8 +1605,7 @@ ktime_t hrtimer_cb_get_time(const struct } EXPORT_SYMBOL_GPL(hrtimer_cb_get_time); =20 -static void __hrtimer_setup(struct hrtimer *timer, - enum hrtimer_restart (*function)(struct hrtimer *), +static void __hrtimer_setup(struct hrtimer *timer, enum hrtimer_restart (*= fn)(struct hrtimer *), clockid_t clock_id, enum hrtimer_mode mode) { bool softtimer =3D !!(mode & HRTIMER_MODE_SOFT); @@ -1705,10 +1641,10 @@ static void __hrtimer_setup(struct hrtim timer->base =3D &cpu_base->clock_base[base]; timerqueue_init(&timer->node); =20 - if (WARN_ON_ONCE(!function)) + if (WARN_ON_ONCE(!fn)) ACCESS_PRIVATE(timer, function) =3D hrtimer_dummy_timeout; else - ACCESS_PRIVATE(timer, function) =3D function; + ACCESS_PRIVATE(timer, function) =3D fn; } =20 /** @@ -1767,12 +1703,10 @@ bool hrtimer_active(const struct hrtimer base =3D READ_ONCE(timer->base); seq =3D raw_read_seqcount_begin(&base->seq); =20 - if (timer->state !=3D HRTIMER_STATE_INACTIVE || - base->running =3D=3D timer) + if (timer->state !=3D HRTIMER_STATE_INACTIVE || base->running =3D=3D tim= er) return true; =20 - } while (read_seqcount_retry(&base->seq, seq) || - base !=3D READ_ONCE(timer->base)); + } while (read_seqcount_retry(&base->seq, seq) || base !=3D READ_ONCE(time= r->base)); =20 return false; } @@ -1795,11 +1729,9 @@ EXPORT_SYMBOL_GPL(hrtimer_active); * a false negative if the read side got smeared over multiple consecutive * __run_hrtimer() invocations. */ - -static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, - struct hrtimer_clock_base *base, - struct hrtimer *timer, ktime_t *now, - unsigned long flags) __must_hold(&cpu_base->lock) +static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, struct hrtime= r_clock_base *base, + struct hrtimer *timer, ktime_t *now, unsigned long flags) + __must_hold(&cpu_base->lock) { enum hrtimer_restart (*fn)(struct hrtimer *); bool expires_in_hardirq; @@ -1819,7 +1751,7 @@ static void __run_hrtimer(struct hrtimer */ raw_write_seqcount_barrier(&base->seq); =20 - __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0); + __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, false); fn =3D ACCESS_PRIVATE(timer, function); =20 /* @@ -1854,8 +1786,7 @@ static void __run_hrtimer(struct hrtimer * hrtimer_start_range_ns() can have popped in and enqueued the timer * for us already. */ - if (restart !=3D HRTIMER_NORESTART && - !(timer->state & HRTIMER_STATE_ENQUEUED)) + if (restart !=3D HRTIMER_NORESTART && !(timer->state & HRTIMER_STATE_ENQU= EUED)) enqueue_hrtimer(timer, base, HRTIMER_MODE_ABS, false); =20 /* @@ -1874,8 +1805,8 @@ static void __run_hrtimer(struct hrtimer static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_= t now, unsigned long flags, unsigned int active_mask) { - struct hrtimer_clock_base *base; unsigned int active =3D cpu_base->active_bases & active_mask; + struct hrtimer_clock_base *base; =20 for_each_active_base(base, cpu_base, active) { struct timerqueue_node *node; @@ -1951,11 +1882,10 @@ void hrtimer_interrupt(struct clock_even retry: cpu_base->in_hrtirq =3D 1; /* - * We set expires_next to KTIME_MAX here with cpu_base->lock - * held to prevent that a timer is enqueued in our queue via - * the migration code. This does not affect enqueueing of - * timers which run their callback and need to be requeued on - * this CPU. + * Set expires_next to KTIME_MAX, which prevents that remote CPUs queue + * timers while __hrtimer_run_queues() is expiring the clock bases. + * Timers which are re/enqueued on the local CPU are not affected by + * this. */ cpu_base->expires_next =3D KTIME_MAX; =20 @@ -2069,8 +1999,7 @@ void hrtimer_run_queues(void) */ static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer) { - struct hrtimer_sleeper *t =3D - container_of(timer, struct hrtimer_sleeper, timer); + struct hrtimer_sleeper *t =3D container_of(timer, struct hrtimer_sleeper,= timer); struct task_struct *task =3D t->task; =20 t->task =3D NULL; @@ -2088,8 +2017,7 @@ static enum hrtimer_restart hrtimer_wake * Wrapper around hrtimer_start_expires() for hrtimer_sleeper based timers * to allow PREEMPT_RT to tweak the delivery mode (soft/hardirq context) */ -void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl, - enum hrtimer_mode mode) +void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl, enum hrtime= r_mode mode) { /* * Make the enqueue delivery mode check work on RT. If the sleeper @@ -2105,8 +2033,8 @@ void hrtimer_sleeper_start_expires(struc } EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires); =20 -static void __hrtimer_setup_sleeper(struct hrtimer_sleeper *sl, - clockid_t clock_id, enum hrtimer_mode mode) +static void __hrtimer_setup_sleeper(struct hrtimer_sleeper *sl, clockid_t = clock_id, + enum hrtimer_mode mode) { /* * On PREEMPT_RT enabled kernels hrtimers which are not explicitly @@ -2142,8 +2070,8 @@ static void __hrtimer_setup_sleeper(stru * @clock_id: the clock to be used * @mode: timer mode abs/rel */ -void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, enum hrtimer_mode mode) +void hrtimer_setup_sleeper_on_stack(struct hrtimer_sleeper *sl, clockid_t = clock_id, + enum hrtimer_mode mode) { debug_setup_on_stack(&sl->timer, clock_id, mode); __hrtimer_setup_sleeper(sl, clock_id, mode); @@ -2216,8 +2144,7 @@ static long __sched hrtimer_nanosleep_re return ret; } =20 -long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode, - const clockid_t clockid) +long hrtimer_nanosleep(ktime_t rqtp, const enum hrtimer_mode mode, const c= lockid_t clockid) { struct restart_block *restart; struct hrtimer_sleeper t; @@ -2260,8 +2187,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kern current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp =3D rmtp; - return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, - CLOCK_MONOTONIC); + return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK= _MONOTONIC); } =20 #endif @@ -2269,7 +2195,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kern #ifdef CONFIG_COMPAT_32BIT_TIME =20 SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp, - struct old_timespec32 __user *, rmtp) + struct old_timespec32 __user *, rmtp) { struct timespec64 tu; =20 @@ -2282,8 +2208,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp =3D rmtp; - return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, - CLOCK_MONOTONIC); + return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK= _MONOTONIC); } #endif =20 @@ -2293,9 +2218,8 @@ SYSCALL_DEFINE2(nanosleep_time32, struct int hrtimers_prepare_cpu(unsigned int cpu) { struct hrtimer_cpu_base *cpu_base =3D &per_cpu(hrtimer_bases, cpu); - int i; =20 - for (i =3D 0; i < HRTIMER_MAX_CLOCK_BASES; i++) { + for (int i =3D 0; i < HRTIMER_MAX_CLOCK_BASES; i++) { struct hrtimer_clock_base *clock_b =3D &cpu_base->clock_base[i]; =20 clock_b->cpu_base =3D cpu_base; @@ -2329,8 +2253,8 @@ int hrtimers_cpu_starting(unsigned int c static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base, struct hrtimer_clock_base *new_base) { - struct hrtimer *timer; struct timerqueue_node *node; + struct hrtimer *timer; =20 while ((node =3D timerqueue_getnext(&old_base->active))) { timer =3D container_of(node, struct hrtimer, node); @@ -2342,7 +2266,7 @@ static void migrate_hrtimer_list(struct * timer could be seen as !active and just vanish away * under us on another CPU */ - __remove_hrtimer(timer, old_base, HRTIMER_STATE_ENQUEUED, 0); + __remove_hrtimer(timer, old_base, HRTIMER_STATE_ENQUEUED, false); timer->base =3D new_base; /* * Enqueue the timers on the new cpu. This does not @@ -2358,7 +2282,7 @@ static void migrate_hrtimer_list(struct =20 int hrtimers_cpu_dying(unsigned int dying_cpu) { - int i, ncpu =3D cpumask_any_and(cpu_active_mask, housekeeping_cpumask(HK_= TYPE_TIMER)); + int ncpu =3D cpumask_any_and(cpu_active_mask, housekeeping_cpumask(HK_TYP= E_TIMER)); struct hrtimer_cpu_base *old_base, *new_base; =20 old_base =3D this_cpu_ptr(&hrtimer_bases); @@ -2371,10 +2295,8 @@ int hrtimers_cpu_dying(unsigned int dyin raw_spin_lock(&old_base->lock); raw_spin_lock_nested(&new_base->lock, SINGLE_DEPTH_NESTING); =20 - for (i =3D 0; i < HRTIMER_MAX_CLOCK_BASES; i++) { - migrate_hrtimer_list(&old_base->clock_base[i], - &new_base->clock_base[i]); - } + for (int i =3D 0; i < HRTIMER_MAX_CLOCK_BASES; i++) + migrate_hrtimer_list(&old_base->clock_base[i], &new_base->clock_base[i]); =20 /* Tell the other CPU to retrigger the next event */ smp_call_function_single(ncpu, retrigger_next_event, NULL, 0); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91CD63A9D87 for ; Tue, 24 Feb 2026 16:37:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951037; cv=none; b=QE1kF9LYNbAbkO3Ox/z5gl6BLQzzt5FopB7bgN8amuvkqm2fBviVybi0suIy/Fr2mThan/kjE5uP4SNj640A8/0LWZsYci8kzJD6CWQY0a2VC6IZDQ4a1TLtyu70z/rpfchiUPr5Dg9tPkObgMgdp4LBTw5SSvSI7e/CtHS2jKw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951037; c=relaxed/simple; bh=ElmNqAJbHBNuSZSLp4xpGHQNAnzemlRBmIImR2xnErI=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=hg1i90FYw12lMGSeFhT+8EXwABN/JSHEqDfZx/1k8O3MO+uAD/MDYCMXyS8ecvu9RazX/6Kxtg4/A0oQ+8y1To3lohwXm63OyjzBaB5nT0hDXWKzCYGmknzRDniGVhAgFxdCQLPqXEootQnfQ7e/jtC1C5S/ZcX3CvAVe8ZJxHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e/7oZRsP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e/7oZRsP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BADF7C116D0; Tue, 24 Feb 2026 16:37:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951037; bh=ElmNqAJbHBNuSZSLp4xpGHQNAnzemlRBmIImR2xnErI=; h=Date:From:To:Cc:Subject:References:From; b=e/7oZRsPUcmtXNNkE0ZljwcArQKQ1ZCQaF7UvshiEhcRfqNIp6w96yGv5df7d0/Ks 7LW3Yu/FcACtX81RNkI9+9XnBjawfnsCzRwpaWuQO7i3s+ZvYjCFDxj8aiJZHPNg4l LgwzFTu4nYs/zOjdrQuwZ9j8ExQ9/xaTeB4xNY5mXjj3zNw7PrfdkhP3PIGX4fe9Ka CsMdsi9fLbGa313YL76Wg3AvXXV1e1sL85TAMfoPtjNhc8suD6j4sEAIcEbHrtR+ea wFvavRqzpnelLjVh5qrT5ELbfnutQiXRUBcmNDi1ZUeb5RP1BIP5Eno4cSa++oPGlB Z9GsNOz9qGsmQ== Date: Tue, 24 Feb 2026 17:37:14 +0100 Message-ID: <20260224163430.409352042@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 25/48] hrtimer: Evaluate timer expiry only once References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" No point in accessing the timer twice. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -810,10 +810,11 @@ static void hrtimer_reprogram(struct hrt { struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); struct hrtimer_clock_base *base =3D timer->base; - ktime_t expires =3D ktime_sub(hrtimer_get_expires(timer), base->offset); + ktime_t expires =3D hrtimer_get_expires(timer); =20 - WARN_ON_ONCE(hrtimer_get_expires(timer) < 0); + WARN_ON_ONCE(expires < 0); =20 + expires =3D ktime_sub(expires, base->offset); /* * CLOCK_REALTIME timer might be requested with an absolute * expiry time which is less than base->offset. Set it to 0. From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CA883A1CFF for ; Tue, 24 Feb 2026 16:37:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951042; cv=none; b=s7ynxsGG1NlmYMPAmfKxg94/Ys8MAgARxunYR9C83EIu5at/5n65Yvay9h/T9mhlIeWLYapsrwtmzi7OmqFg1nOhVQt1OumZsW2KOwVidWLGUyNRSTBTgOnMaBxANijB8Rp7CPKKk3f2R+jTiwjj99RVi6SxonSDbSELFoweT1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951042; c=relaxed/simple; bh=V5hLFPfXUNuHS09Rb/NKMXpRP6SYrccxJ/BH6xDunQM=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZnLUBQJhXmNrbVjiWgCjIIFCBuxvDuAL0HvOhKgW9dcxDfSDgkzcSaM+NqeIiZZHWPLwCjc3RssxKdT9B/Ts8HtNNmh5cmGRCAQdDcD1rt0pFjcsgYXwGl/6KiBuRMiqn+FcC55/8xpnN98ZrNYT3lI/XVfAmrS4/24BLg9/qpQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gnyrzRb5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gnyrzRb5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43D96C116D0; Tue, 24 Feb 2026 16:37:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951042; bh=V5hLFPfXUNuHS09Rb/NKMXpRP6SYrccxJ/BH6xDunQM=; h=Date:From:To:Cc:Subject:References:From; b=gnyrzRb52hn8NSlH24eSd7Oe2Dcjz90X7kqOvVotlkh36+r/KIxAPUtJ7+SDEpfvf n/okhkjtr4bLs3M7lP78sq0LyWk1UMwVkhLYNrKkBLQZF49LVOmXz5zoAaIcgMggjC Waf/BR/kdeUj0HuWMrGoZp0vNwEnLntlQ8zAZDTmQM5GmQ1IsiezMYmApsmjq61IXl LLCoXWwpUyFBpojbprQcTUOBe8rojiiDVWJ4JwMNbm7OOOuWUtRpIt8cn/GaTjo3o+ UjeS77hBVP2x/FJsqkC5izwmoLv1ZtEng55hUmGvdmsgO2AaZAiFnfKZyHd/aOdDSK Lb4zBIa0cAL4g== Date: Tue, 24 Feb 2026 17:37:18 +0100 Message-ID: <20260224163430.475262618@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 26/48] hrtimer: Replace the bitfield in hrtimer_cpu_base References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use bool for the various flags as that creates better code in the hot path. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer_defs.h | 10 +++++----- kernel/time/hrtimer.c | 25 +++++++++++++------------ 2 files changed, 18 insertions(+), 17 deletions(-) --- a/include/linux/hrtimer_defs.h +++ b/include/linux/hrtimer_defs.h @@ -83,11 +83,11 @@ struct hrtimer_cpu_base { unsigned int cpu; unsigned int active_bases; unsigned int clock_was_set_seq; - unsigned int hres_active : 1, - in_hrtirq : 1, - hang_detected : 1, - softirq_activated : 1, - online : 1; + bool hres_active; + bool in_hrtirq; + bool hang_detected; + bool softirq_activated; + bool online; #ifdef CONFIG_HIGH_RES_TIMERS unsigned int nr_events; unsigned short nr_retries; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -741,7 +741,7 @@ static void hrtimer_switch_to_hres(void) pr_warn("Could not switch to high resolution mode on CPU %u\n", base->cp= u); return; } - base->hres_active =3D 1; + base->hres_active =3D true; hrtimer_resolution =3D HIGH_RES_NSEC; =20 tick_setup_sched_timer(true); @@ -1854,7 +1854,7 @@ static __latent_entropy void hrtimer_run now =3D hrtimer_update_base(cpu_base); __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_SOFT); =20 - cpu_base->softirq_activated =3D 0; + cpu_base->softirq_activated =3D false; hrtimer_update_softirq_timer(cpu_base, true); =20 raw_spin_unlock_irqrestore(&cpu_base->lock, flags); @@ -1881,7 +1881,7 @@ void hrtimer_interrupt(struct clock_even raw_spin_lock_irqsave(&cpu_base->lock, flags); entry_time =3D now =3D hrtimer_update_base(cpu_base); retry: - cpu_base->in_hrtirq =3D 1; + cpu_base->in_hrtirq =3D true; /* * Set expires_next to KTIME_MAX, which prevents that remote CPUs queue * timers while __hrtimer_run_queues() is expiring the clock bases. @@ -1892,7 +1892,7 @@ void hrtimer_interrupt(struct clock_even =20 if (!ktime_before(now, cpu_base->softirq_expires_next)) { cpu_base->softirq_expires_next =3D KTIME_MAX; - cpu_base->softirq_activated =3D 1; + cpu_base->softirq_activated =3D true; raise_timer_softirq(HRTIMER_SOFTIRQ); } =20 @@ -1905,12 +1905,12 @@ void hrtimer_interrupt(struct clock_even * against it. */ cpu_base->expires_next =3D expires_next; - cpu_base->in_hrtirq =3D 0; + cpu_base->in_hrtirq =3D false; raw_spin_unlock_irqrestore(&cpu_base->lock, flags); =20 /* Reprogramming necessary ? */ if (!tick_program_event(expires_next, 0)) { - cpu_base->hang_detected =3D 0; + cpu_base->hang_detected =3D false; return; } =20 @@ -1939,7 +1939,7 @@ void hrtimer_interrupt(struct clock_even * time away. */ cpu_base->nr_hangs++; - cpu_base->hang_detected =3D 1; + cpu_base->hang_detected =3D true; raw_spin_unlock_irqrestore(&cpu_base->lock, flags); =20 delta =3D ktime_sub(now, entry_time); @@ -1987,7 +1987,7 @@ void hrtimer_run_queues(void) =20 if (!ktime_before(now, cpu_base->softirq_expires_next)) { cpu_base->softirq_expires_next =3D KTIME_MAX; - cpu_base->softirq_activated =3D 1; + cpu_base->softirq_activated =3D true; raise_timer_softirq(HRTIMER_SOFTIRQ); } =20 @@ -2239,13 +2239,14 @@ int hrtimers_cpu_starting(unsigned int c =20 /* Clear out any left over state from a CPU down operation */ cpu_base->active_bases =3D 0; - cpu_base->hres_active =3D 0; - cpu_base->hang_detected =3D 0; + cpu_base->hres_active =3D false; + cpu_base->hang_detected =3D false; cpu_base->next_timer =3D NULL; cpu_base->softirq_next_timer =3D NULL; cpu_base->expires_next =3D KTIME_MAX; cpu_base->softirq_expires_next =3D KTIME_MAX; - cpu_base->online =3D 1; + cpu_base->softirq_activated =3D false; + cpu_base->online =3D true; return 0; } =20 @@ -2303,7 +2304,7 @@ int hrtimers_cpu_dying(unsigned int dyin smp_call_function_single(ncpu, retrigger_next_event, NULL, 0); =20 raw_spin_unlock(&new_base->lock); - old_base->online =3D 0; + old_base->online =3D false; raw_spin_unlock(&old_base->lock); =20 return 0; From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 354103A784D for ; Tue, 24 Feb 2026 16:37:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951047; cv=none; b=Y+y+FZgNNDkLEX3Z5Dnm6wkUxkeKrvy76aEkJLZbSnBGpyWL89eLpal7Gwy7vgx7uExzxexj3r6OhN6wYoxXunyKI4Q1tfGf/d8XQPq8B2rseOnvPHU/Bme1s3vnSS8SFw60o6S2BuZvhTL7BKHmxcJ7mbpkz4LfoOZbnXRI1PI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951047; c=relaxed/simple; bh=h6A67bkSdFj0YOe5/KaqU/gOQe6XcVoTQr7Ikstrz6M=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Ht6zTWdh9HPIgPEjWUMAMeN0pEbRv2RAnm28WTmnK5CxTdmPdedLI+H+AfWLYzDR+ID5C70s9Ie94qrv0eTTBFB0dvRaY90++R3AmZkCEMA3SIz2xmAJ1dVXATHz/sG3Q9lSjRlSUDEWpFpNCLnbarOL2+8JiBDTzyowfzOHftQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sdKgbdxJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sdKgbdxJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 473C3C116D0; Tue, 24 Feb 2026 16:37:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951046; bh=h6A67bkSdFj0YOe5/KaqU/gOQe6XcVoTQr7Ikstrz6M=; h=Date:From:To:Cc:Subject:References:From; b=sdKgbdxJ83qCtpP0cPOS3xS4nErQum3l9ctm/mt1GMMILFY8bJOcQJSmXjkc5DxEK sRxoJLXdGpE+5uleeF4JNwVqAOD+j65PLqqDUYGlnYbJsgJ6r6qRf5xKtizW6N4MJt jgUhnK7d+9VuZHhc5s0MZXZhVkg8Hx4T5zU9FfMvqcohmv4q8gcO7SjsHuEpphg0sC W/dTP5aB7z4Njv4iFtfdcN/4arnCtCpjgqqHKAfNDD5B4jkphN8Q9XxaAn/OvEppsX 1BRX75g9pexL4FlIosG/p0e1lu7kKplzbbCwth6ytM0TF1A1DNWTqvkhq4xA1m4oRH pPUnwWMvXIs8w== Date: Tue, 24 Feb 2026 17:37:23 +0100 Message-ID: <20260224163430.542427240@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 27/48] hrtimer: Convert state and properties to boolean References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All 'u8' flags are true booleans, so make it entirely clear that these can only contain true or false. This is especially true for hrtimer::state, which has a historical leftover of using the state with bitwise operations. That was used in the early hrtimer implementation with several bits, but then converted to a boolean state. But that conversion missed to replace the bit OR and bit check operations all over the place, which creates suboptimal code. As of today 'state' is a misnomer because it's only purpose is to reflect whether the timer is enqueued into the RB-tree or not. Rename it to 'is_queued' and make all operations on it boolean. This reduces text size from 8926 to 8732 bytes. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer.h | 31 +--------------------- include/linux/hrtimer_types.h | 12 ++++---- kernel/time/hrtimer.c | 58 ++++++++++++++++++++++++++++---------= ----- kernel/time/timer_list.c | 2 - 4 files changed, 49 insertions(+), 54 deletions(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -63,33 +63,6 @@ enum hrtimer_mode { HRTIMER_MODE_REL_PINNED_HARD =3D HRTIMER_MODE_REL_PINNED | HRTIMER_MODE_H= ARD, }; =20 -/* - * Values to track state of the timer - * - * Possible states: - * - * 0x00 inactive - * 0x01 enqueued into rbtree - * - * The callback state is not part of the timer->state because clearing it = would - * mean touching the timer after the callback, this makes it impossible to= free - * the timer from the callback function. - * - * Therefore we track the callback state in: - * - * timer->base->cpu_base->running =3D=3D timer - * - * On SMP it is possible to have a "callback function running and enqueued" - * status. It happens for example when a posix timer expired and the callb= ack - * queued a signal. Between dropping the lock which protects the posix tim= er - * and reacquiring the base lock of the hrtimer, another CPU can deliver t= he - * signal and rearm the timer. - * - * All state transitions are protected by cpu_base->lock. - */ -#define HRTIMER_STATE_INACTIVE 0x00 -#define HRTIMER_STATE_ENQUEUED 0x01 - /** * struct hrtimer_sleeper - simple sleeper structure * @timer: embedded timer structure @@ -300,8 +273,8 @@ extern bool hrtimer_active(const struct */ static inline bool hrtimer_is_queued(struct hrtimer *timer) { - /* The READ_ONCE pairs with the update functions of timer->state */ - return !!(READ_ONCE(timer->state) & HRTIMER_STATE_ENQUEUED); + /* The READ_ONCE pairs with the update functions of timer->is_queued */ + return READ_ONCE(timer->is_queued); } =20 /* --- a/include/linux/hrtimer_types.h +++ b/include/linux/hrtimer_types.h @@ -28,7 +28,7 @@ enum hrtimer_restart { * was armed. * @function: timer expiry callback function * @base: pointer to the timer base (per cpu and per clock) - * @state: state information (See bit values above) + * @is_queued: Indicates whether a timer is enqueued or not * @is_rel: Set if the timer was armed relative * @is_soft: Set if hrtimer will be expired in soft interrupt context. * @is_hard: Set if hrtimer will be expired in hard interrupt context @@ -43,11 +43,11 @@ struct hrtimer { ktime_t _softexpires; enum hrtimer_restart (*__private function)(struct hrtimer *); struct hrtimer_clock_base *base; - u8 state; - u8 is_rel; - u8 is_soft; - u8 is_hard; - u8 is_lazy; + bool is_queued; + bool is_rel; + bool is_soft; + bool is_hard; + bool is_lazy; }; =20 #endif /* _LINUX_HRTIMER_TYPES_H */ --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -50,6 +50,28 @@ #include "tick-internal.h" =20 /* + * Constants to set the queued state of the timer (INACTIVE, ENQUEUED) + * + * The callback state is kept separate in the CPU base because having it in + * the timer would required touching the timer after the callback, which + * makes it impossible to free the timer from the callback function. + * + * Therefore we track the callback state in: + * + * timer->base->cpu_base->running =3D=3D timer + * + * On SMP it is possible to have a "callback function running and enqueued" + * status. It happens for example when a posix timer expired and the callb= ack + * queued a signal. Between dropping the lock which protects the posix tim= er + * and reacquiring the base lock of the hrtimer, another CPU can deliver t= he + * signal and rearm the timer. + * + * All state transitions are protected by cpu_base->lock. + */ +#define HRTIMER_STATE_INACTIVE false +#define HRTIMER_STATE_ENQUEUED true + +/* * The resolution of the clocks. The resolution value is returned in * the clock_getres() system call to give application programmers an * idea of the (in)accuracy of timers. Timer values are rounded up to @@ -1038,7 +1060,7 @@ u64 hrtimer_forward(struct hrtimer *time if (delta < 0) return 0; =20 - if (WARN_ON(timer->state & HRTIMER_STATE_ENQUEUED)) + if (WARN_ON(timer->is_queued)) return 0; =20 if (interval < hrtimer_resolution) @@ -1082,7 +1104,7 @@ static bool enqueue_hrtimer(struct hrtim base->cpu_base->active_bases |=3D 1 << base->index; =20 /* Pairs with the lockless read in hrtimer_is_queued() */ - WRITE_ONCE(timer->state, HRTIMER_STATE_ENQUEUED); + WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); =20 return timerqueue_add(&base->active, &timer->node); } @@ -1096,18 +1118,18 @@ static bool enqueue_hrtimer(struct hrtim * anyway (e.g. timer interrupt) */ static void __remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_b= ase *base, - u8 newstate, bool reprogram) + bool newstate, bool reprogram) { struct hrtimer_cpu_base *cpu_base =3D base->cpu_base; - u8 state =3D timer->state; =20 lockdep_assert_held(&cpu_base->lock); =20 - /* Pairs with the lockless read in hrtimer_is_queued() */ - WRITE_ONCE(timer->state, newstate); - if (!(state & HRTIMER_STATE_ENQUEUED)) + if (!timer->is_queued) return; =20 + /* Pairs with the lockless read in hrtimer_is_queued() */ + WRITE_ONCE(timer->is_queued, newstate); + if (!timerqueue_del(&base->active, &timer->node)) cpu_base->active_bases &=3D ~(1 << base->index); =20 @@ -1127,11 +1149,11 @@ static void __remove_hrtimer(struct hrti static inline bool remove_hrtimer(struct hrtimer *timer, struct hrtimer_cl= ock_base *base, bool restart, bool keep_local) { - u8 state =3D timer->state; + bool queued_state =3D timer->is_queued; =20 lockdep_assert_held(&base->cpu_base->lock); =20 - if (state & HRTIMER_STATE_ENQUEUED) { + if (queued_state) { bool reprogram; =20 debug_hrtimer_deactivate(timer); @@ -1153,11 +1175,11 @@ static inline bool remove_hrtimer(struct * and a moment later when it's requeued). */ if (!restart) - state =3D HRTIMER_STATE_INACTIVE; + queued_state =3D HRTIMER_STATE_INACTIVE; else reprogram &=3D !keep_local; =20 - __remove_hrtimer(timer, base, state, reprogram); + __remove_hrtimer(timer, base, queued_state, reprogram); return true; } return false; @@ -1704,7 +1726,7 @@ bool hrtimer_active(const struct hrtimer base =3D READ_ONCE(timer->base); seq =3D raw_read_seqcount_begin(&base->seq); =20 - if (timer->state !=3D HRTIMER_STATE_INACTIVE || base->running =3D=3D tim= er) + if (timer->is_queued || base->running =3D=3D timer) return true; =20 } while (read_seqcount_retry(&base->seq, seq) || base !=3D READ_ONCE(time= r->base)); @@ -1721,7 +1743,7 @@ EXPORT_SYMBOL_GPL(hrtimer_active); * - callback: the timer is being ran * - post: the timer is inactive or (re)queued * - * On the read side we ensure we observe timer->state and cpu_base->running + * On the read side we ensure we observe timer->is_queued and cpu_base->ru= nning * from the same section, if anything changed while we looked at it, we re= try. * This includes timer->base changing because sequence numbers alone are * insufficient for that. @@ -1744,11 +1766,11 @@ static void __run_hrtimer(struct hrtimer base->running =3D timer; =20 /* - * Separate the ->running assignment from the ->state assignment. + * Separate the ->running assignment from the ->is_queued assignment. * * As with a regular write barrier, this ensures the read side in * hrtimer_active() cannot observe base->running =3D=3D NULL && - * timer->state =3D=3D INACTIVE. + * timer->is_queued =3D=3D INACTIVE. */ raw_write_seqcount_barrier(&base->seq); =20 @@ -1787,15 +1809,15 @@ static void __run_hrtimer(struct hrtimer * hrtimer_start_range_ns() can have popped in and enqueued the timer * for us already. */ - if (restart !=3D HRTIMER_NORESTART && !(timer->state & HRTIMER_STATE_ENQU= EUED)) + if (restart =3D=3D HRTIMER_RESTART && !timer->is_queued) enqueue_hrtimer(timer, base, HRTIMER_MODE_ABS, false); =20 /* - * Separate the ->running assignment from the ->state assignment. + * Separate the ->running assignment from the ->is_queued assignment. * * As with a regular write barrier, this ensures the read side in * hrtimer_active() cannot observe base->running.timer =3D=3D NULL && - * timer->state =3D=3D INACTIVE. + * timer->is_queued =3D=3D INACTIVE. */ raw_write_seqcount_barrier(&base->seq); =20 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -47,7 +47,7 @@ print_timer(struct seq_file *m, struct h int idx, u64 now) { SEQ_printf(m, " #%d: <%p>, %ps", idx, taddr, ACCESS_PRIVATE(timer, functi= on)); - SEQ_printf(m, ", S:%02x", timer->state); + SEQ_printf(m, ", S:%02x", timer->is_queued); SEQ_printf(m, "\n"); SEQ_printf(m, " # expires at %Lu-%Lu nsecs [in %Ld to %Ld nsecs]\n", (unsigned long long)ktime_to_ns(hrtimer_get_softexpires(timer)), From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA0523A784D for ; Tue, 24 Feb 2026 16:37:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951051; cv=none; b=EBnlpKXc1HcK1H2NkhSNyqLWv1iUSVUfGmv5bepWruuaSZslX04ahC1hv71K8+N5ocHYvV8/3sMtNULqOMZwVVG/9t7QwY+O3K/B0K9wcgNbeQMfvODvvvL6xI+gnDpQEo/LJf4ilpRBNcXLfXCcaNLoPfFI41GiV9+8nzH6SJM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951051; c=relaxed/simple; bh=G/PGHg82BAe5HXu9wAZJbZtwlI0k+2yD9e8z9hgJnRI=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=sUoToIkZSFi3krC5l/6ca6okMBagfobiOFxAJQQB9eK86mNx3tGG/QMwgTRvQjLpUDWyAUFEN6pk3b0CEf+MWWEi7mvJbc4jrgbi0eGnFECHbkuK1BrOgbRbR/YXq+4gJOiwTUcq97KAuZDzpfybMLjlKD6j/lV1ybxJziuZRgg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tNLCFN2u; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tNLCFN2u" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35AAAC116D0; Tue, 24 Feb 2026 16:37:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951051; bh=G/PGHg82BAe5HXu9wAZJbZtwlI0k+2yD9e8z9hgJnRI=; h=Date:From:To:Cc:Subject:References:From; b=tNLCFN2uHHxybQw6544q+6dsuqa0F/NChBKPm13lX9J5ruBV02YW2NfpsFSF/l5zJ PILJZJGLWAJvjTlNmgeCN5Quw1H/0Fh3ur4uqFoJ/VtiXSNhw8Z+wux9bNt5f76CwG BnRtwwKXo7L6zpnrZZuKAByLz50LlzBEmmLB04/NTuKwA6ZGchz1EGxjTTVA7vYa6S v5ZF52npheVaCClVeMRx7HGQszre3PeIovhqhVb65Vtt4J/sJA18kpMe1TmHTxgnSo EzZ8YrbePIeFXM4hOzEPhP0+1Zecy01ztEXTOWRgQ8eC0Qts0/8mel6naUOAASp4vU cGqjycH3Qahrw== Date: Tue, 24 Feb 2026 17:37:28 +0100 Message-ID: <20260224163430.607935269@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 28/48] hrtimer: Optimize for local timers References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The decision whether to keep timers on the local CPU or on the CPU they are associated to is suboptimal and causes the expensive switch_hrtimer_base() mechanism to be invoked more than necessary. This is especially true for pinned timers. Rewrite the decision logic so that the current base is kept if: 1) The callback is running on the base 2) The timer is associated to the local CPU and the first expiring timer= as that allows to optimize for reprogramming avoidance 3) The timer is associated to the local CPU and pinned 4) The timer is associated to the local CPU and timer migration is disabled. Only #2 was covered by the original code, but especially #3 makes a difference for high frequency rearming timers like the scheduler hrtick timer. If timer migration is disabled, then #4 avoids most of the base switches. Signed-off-by: Thomas Gleixner --- kernel/time/hrtimer.c | 101 ++++++++++++++++++++++++++++++++-------------= ----- 1 file changed, 65 insertions(+), 36 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1147,7 +1147,7 @@ static void __remove_hrtimer(struct hrti } =20 static inline bool remove_hrtimer(struct hrtimer *timer, struct hrtimer_cl= ock_base *base, - bool restart, bool keep_local) + bool restart, bool keep_base) { bool queued_state =3D timer->is_queued; =20 @@ -1177,7 +1177,7 @@ static inline bool remove_hrtimer(struct if (!restart) queued_state =3D HRTIMER_STATE_INACTIVE; else - reprogram &=3D !keep_local; + reprogram &=3D !keep_base; =20 __remove_hrtimer(timer, base, queued_state, reprogram); return true; @@ -1220,29 +1220,57 @@ static void hrtimer_update_softirq_timer hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram); } =20 +#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) +static __always_inline bool hrtimer_prefer_local(bool is_local, bool is_fi= rst, bool is_pinned) +{ + if (static_branch_likely(&timers_migration_enabled)) { + /* + * If it is local and the first expiring timer keep it on the local + * CPU to optimize reprogramming of the clockevent device. Also + * avoid switch_hrtimer_base() overhead when local and pinned. + */ + if (!is_local) + return false; + return is_first || is_pinned; + } + return is_local; +} +#else +static __always_inline bool hrtimer_prefer_local(bool is_local, bool is_fi= rst, bool is_pinned) +{ + return is_local; +} +#endif + +static inline bool hrtimer_keep_base(struct hrtimer *timer, bool is_local,= bool is_first, + bool is_pinned) +{ + /* If the timer is running the callback it has to stay on its CPU base. */ + if (unlikely(timer->base->running =3D=3D timer)) + return true; + + return hrtimer_prefer_local(is_local, is_first, is_pinned); +} + static bool __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, u= 64 delta_ns, const enum hrtimer_mode mode, struct hrtimer_clock_base *base) { struct hrtimer_cpu_base *this_cpu_base =3D this_cpu_ptr(&hrtimer_bases); - struct hrtimer_clock_base *new_base; - bool force_local, first, was_armed; + bool is_pinned, first, was_first, was_armed, keep_base =3D false; + struct hrtimer_cpu_base *cpu_base =3D base->cpu_base; =20 - /* - * If the timer is on the local cpu base and is the first expiring - * timer then this might end up reprogramming the hardware twice - * (on removal and on enqueue). To avoid that prevent the reprogram - * on removal, keep the timer local to the current CPU and enforce - * reprogramming after it is queued no matter whether it is the new - * first expiring timer again or not. - */ - force_local =3D base->cpu_base =3D=3D this_cpu_base; - force_local &=3D base->cpu_base->next_timer =3D=3D timer; + was_first =3D cpu_base->next_timer =3D=3D timer; + is_pinned =3D !!(mode & HRTIMER_MODE_PINNED); =20 /* - * Don't force local queuing if this enqueue happens on a unplugged - * CPU after hrtimer_cpu_dying() has been invoked. + * Don't keep it local if this enqueue happens on a unplugged CPU + * after hrtimer_cpu_dying() has been invoked. */ - force_local &=3D this_cpu_base->online; + if (likely(this_cpu_base->online)) { + bool is_local =3D cpu_base =3D=3D this_cpu_base; + + keep_base =3D hrtimer_keep_base(timer, is_local, was_first, is_pinned); + } =20 /* * Remove an active timer from the queue. In case it is not queued @@ -1254,8 +1282,11 @@ static bool __hrtimer_start_range_ns(str * reprogramming later if it was the first expiring timer. This * avoids programming the underlying clock event twice (once at * removal and once after enqueue). + * + * @keep_base is also true if the timer callback is running on a + * remote CPU and for local pinned timers. */ - was_armed =3D remove_hrtimer(timer, base, true, force_local); + was_armed =3D remove_hrtimer(timer, base, true, keep_base); =20 if (mode & HRTIMER_MODE_REL) tim =3D ktime_add_safe(tim, __hrtimer_cb_get_time(base->clockid)); @@ -1265,21 +1296,21 @@ static bool __hrtimer_start_range_ns(str hrtimer_set_expires_range_ns(timer, tim, delta_ns); =20 /* Switch the timer base, if necessary: */ - if (!force_local) - new_base =3D switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED= ); - else - new_base =3D base; + if (!keep_base) { + base =3D switch_hrtimer_base(timer, base, is_pinned); + cpu_base =3D base->cpu_base; + } =20 - first =3D enqueue_hrtimer(timer, new_base, mode, was_armed); + first =3D enqueue_hrtimer(timer, base, mode, was_armed); =20 /* * If the hrtimer interrupt is running, then it will reevaluate the * clock bases and reprogram the clock event device. */ - if (new_base->cpu_base->in_hrtirq) + if (cpu_base->in_hrtirq) return false; =20 - if (!force_local) { + if (!was_first || cpu_base !=3D this_cpu_base) { /* * If the current CPU base is online, then the timer is never * queued on a remote CPU if it would be the first expiring @@ -1288,7 +1319,7 @@ static bool __hrtimer_start_range_ns(str * re-evaluate the first expiring timer after completing the * callbacks. */ - if (hrtimer_base_is_online(this_cpu_base)) + if (likely(hrtimer_base_is_online(this_cpu_base))) return first; =20 /* @@ -1296,11 +1327,8 @@ static bool __hrtimer_start_range_ns(str * already offline. If the timer is the first to expire, * kick the remote CPU to reprogram the clock event. */ - if (first) { - struct hrtimer_cpu_base *new_cpu_base =3D new_base->cpu_base; - - smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd); - } + if (first) + smp_call_function_single_async(cpu_base->cpu, &cpu_base->csd); return false; } =20 @@ -1314,16 +1342,17 @@ static bool __hrtimer_start_range_ns(str * required. */ if (timer->is_lazy) { - if (new_base->cpu_base->expires_next <=3D hrtimer_get_expires(timer)) + if (cpu_base->expires_next <=3D hrtimer_get_expires(timer)) return false; } =20 /* - * Timer was forced to stay on the current CPU to avoid - * reprogramming on removal and enqueue. Force reprogram the - * hardware by evaluating the new first expiring timer. + * Timer was the first expiring timer and forced to stay on the + * current CPU to avoid reprogramming on removal and enqueue. Force + * reprogram the hardware by evaluating the new first expiring + * timer. */ - hrtimer_force_reprogram(new_base->cpu_base, /* skip_equal */ true); + hrtimer_force_reprogram(cpu_base, /* skip_equal */ true); return false; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 282B83A7F5F for ; Tue, 24 Feb 2026 16:37:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951057; cv=none; b=MZ63opwkBUg4IcIib4tZ46r9Pt0/+JFtK3+C1+wGCc5iefYuOXFXbJE8r5odJ3WJUf6fsz1PugKYzknHCdJTksC3CUf5YOqjoJ107UXDSzRpIwRaJExhijTo09HNa4pHgPeS63cjdVc2QU9e0Z/0pebXo5+wx7C+CDbpHUd47K0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951057; c=relaxed/simple; bh=7eZmj/zvoLv6mIE24CqHWS9+DAQfZ47PSvPeMFsmLa4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=M6pTl8ds5xcDwMb9pMmCYnHwBVO1eKm0BLtxb9fTUlgQq3NobDrUYSPl1yZtmxDAo6x7jy8jteZrwWvHG9qZTVEeUE4ndbp/G3Oreqf5dHh5SqtQxg9KZhUmy5RjYGPOqRbNT2K0CNauEwLcpxL8zx2ymU/axbGGPUDbzXQFMFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RufTrlhG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RufTrlhG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38728C116D0; Tue, 24 Feb 2026 16:37:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951056; bh=7eZmj/zvoLv6mIE24CqHWS9+DAQfZ47PSvPeMFsmLa4=; h=Date:From:To:Cc:Subject:References:From; b=RufTrlhGtO+qlvsnyrNOf8RRac0uvW+T+4FhTfIhT9T2oebRkAcsaGb24FIuGv9gL dkMaMXVsb1DbPxs/Tg04EHTYb4zi3FwunbDEA8TYYnuEzjBgwPdKmwOcDEjlJlMYh7 mo9y8ATu+K4xsKYU4l2HOVDDMffxnWwj6dFABprDDMjXg+hkPtmw8ChAHIDAbLKgx2 NIR+phpycpqTHhgSLxpCOtviSoZtKYBaH3JdiosYAcjKOWixKuJV+OmIl5GAGD4GjV VdlFom9U2aQMZKe6bdJy//n0SESN0e4QRNblhe0qgwxKvFt+/DsWtcK1x3zo0i08J2 VIaSXYrhIgIaQ== Date: Tue, 24 Feb 2026 17:37:33 +0100 Message-ID: <20260224163430.673473029@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 29/48] hrtimer: Use NOHZ information for locality References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The decision to keep a timer which is associated to the local CPU on that CPU does not take NOHZ information into account. As a result there are a lot of hrtimer base switch invocations which end up not switching the base and stay on the local CPU. That's just work for nothing and can be further improved. If the local CPU is part of the NOISE housekeeping mask, then check: 1) Whether the local CPU has the tick running, which means it is either not idle or already expecting a timer soon. 2) Whether the tick is stopped and need_resched() is set, which means the CPU is about to exit idle. This reduces the amount of hrtimer base switch attempts, which end up on the local CPU anyway, significantly and prepares for further optimizations. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1231,7 +1231,18 @@ static __always_inline bool hrtimer_pref */ if (!is_local) return false; - return is_first || is_pinned; + if (is_first || is_pinned) + return true; + + /* Honour the NOHZ full restrictions */ + if (!housekeeping_cpu(smp_processor_id(), HK_TYPE_KERNEL_NOISE)) + return false; + + /* + * If the tick is not stopped or need_resched() is set, then + * there is no point in moving the timer somewhere else. + */ + return !tick_nohz_tick_stopped() || need_resched(); } return is_local; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A377E3A7F73 for ; Tue, 24 Feb 2026 16:37:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951061; cv=none; b=WmyKHSUNPtolWU4O/W+4R0GnEmFolE1K/FgXhampove9EcGFJ8EJ4QIRCCJaBcUonAKqsePxL3MAJVX8xebwer18EWZKQJznznHPDzAMxSEZc/pPHF+9lYUQGNOHG75swfxlsJd11fVWPFxRDs5xKP5JwwwuTn/KAtbpSTNaN6c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951061; c=relaxed/simple; bh=rMMuECiwN5OCAX2ML+MIBWLXX/8zsL2y08Z5WD9oE+g=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=AUeDLmDxraVvZEL8iaBiND3PpnBPUtsWzCGVR8F1QkVtg/i9eurbB6xApX05TPoVZEcyc7Qwyu1VsvTbnFlP5j/1YjfDPlKjYrTgHdf3fUNAbNQC2Js4YJepz4QhHbD6MjMQydPmnur/0UjjTL84eAG1JezCFGibIABHfIFs+o0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pe4l8Adm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pe4l8Adm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21479C116D0; Tue, 24 Feb 2026 16:37:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951061; bh=rMMuECiwN5OCAX2ML+MIBWLXX/8zsL2y08Z5WD9oE+g=; h=Date:From:To:Cc:Subject:References:From; b=Pe4l8AdmxP2j96S9dHiZAaSALa3fqUwqSFz+Yc8fy2J6AGhYxCfGYUU8qM9LDZyHB sHajadjakxypqm8o4OO+QqcnCQ+xZDQmZQndvfRR4a0t+MYMqiLFFjrlijOCWp7+qQ XQoz7c/M5IFtn6navOTi/4B1AtURlCKaLo3zuBfySbHDbDwC1+yZ/xIBzK57BJ/CA9 8QjE8TQk6dDtc+0STnmD6C0ktI2iGpcxO7Bs/4yatKDznjW5UZPnb0FpKlva94X+cf AvNc7xfXfsVHrUqBkZqUcx7AH+aZ+02lJyB0XRJlXo/S8N1UWy/lLNVTJlh4WBCwqG 6rDN+m4HX2yQw== Date: Tue, 24 Feb 2026 17:37:38 +0100 Message-ID: <20260224163430.737600486@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 30/48] hrtimer: Separate remove/enqueue handling for local timers References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As the base switch can be avoided completely when the base stays the same the remove/enqueue handling can be more streamlined. Split it out into a separate function which handles both in one go which is way more efficient and makes the code simpler to follow. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 72 +++++++++++++++++++++++++++++----------------= ----- 1 file changed, 43 insertions(+), 29 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1147,13 +1147,11 @@ static void __remove_hrtimer(struct hrti } =20 static inline bool remove_hrtimer(struct hrtimer *timer, struct hrtimer_cl= ock_base *base, - bool restart, bool keep_base) + bool newstate) { - bool queued_state =3D timer->is_queued; - lockdep_assert_held(&base->cpu_base->lock); =20 - if (queued_state) { + if (timer->is_queued) { bool reprogram; =20 debug_hrtimer_deactivate(timer); @@ -1168,23 +1166,35 @@ static inline bool remove_hrtimer(struct */ reprogram =3D base->cpu_base =3D=3D this_cpu_ptr(&hrtimer_bases); =20 - /* - * If the timer is not restarted then reprogramming is - * required if the timer is local. If it is local and about - * to be restarted, avoid programming it twice (on removal - * and a moment later when it's requeued). - */ - if (!restart) - queued_state =3D HRTIMER_STATE_INACTIVE; - else - reprogram &=3D !keep_base; - - __remove_hrtimer(timer, base, queued_state, reprogram); + __remove_hrtimer(timer, base, newstate, reprogram); return true; } return false; } =20 +static inline bool +remove_and_enqueue_same_base(struct hrtimer *timer, struct hrtimer_clock_b= ase *base, + const enum hrtimer_mode mode, ktime_t expires, u64 delta_ns) +{ + /* Remove it from the timer queue if active */ + if (timer->is_queued) { + debug_hrtimer_deactivate(timer); + timerqueue_del(&base->active, &timer->node); + } + + /* Set the new expiry time */ + hrtimer_set_expires_range_ns(timer, expires, delta_ns); + + debug_activate(timer, mode, timer->is_queued); + base->cpu_base->active_bases |=3D 1 << base->index; + + /* Pairs with the lockless read in hrtimer_is_queued() */ + WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); + + /* Returns true if this is the first expiring timer */ + return timerqueue_add(&base->active, &timer->node); +} + static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t= tim, const enum hrtimer_mode mode) { @@ -1267,7 +1277,7 @@ static bool __hrtimer_start_range_ns(str const enum hrtimer_mode mode, struct hrtimer_clock_base *base) { struct hrtimer_cpu_base *this_cpu_base =3D this_cpu_ptr(&hrtimer_bases); - bool is_pinned, first, was_first, was_armed, keep_base =3D false; + bool is_pinned, first, was_first, keep_base =3D false; struct hrtimer_cpu_base *cpu_base =3D base->cpu_base; =20 was_first =3D cpu_base->next_timer =3D=3D timer; @@ -1283,6 +1293,12 @@ static bool __hrtimer_start_range_ns(str keep_base =3D hrtimer_keep_base(timer, is_local, was_first, is_pinned); } =20 + /* Calculate absolute expiry time for relative timers */ + if (mode & HRTIMER_MODE_REL) + tim =3D ktime_add_safe(tim, __hrtimer_cb_get_time(base->clockid)); + /* Compensate for low resolution granularity */ + tim =3D hrtimer_update_lowres(timer, tim, mode); + /* * Remove an active timer from the queue. In case it is not queued * on the current CPU, make sure that remove_hrtimer() updates the @@ -1297,22 +1313,20 @@ static bool __hrtimer_start_range_ns(str * @keep_base is also true if the timer callback is running on a * remote CPU and for local pinned timers. */ - was_armed =3D remove_hrtimer(timer, base, true, keep_base); - - if (mode & HRTIMER_MODE_REL) - tim =3D ktime_add_safe(tim, __hrtimer_cb_get_time(base->clockid)); - - tim =3D hrtimer_update_lowres(timer, tim, mode); + if (likely(keep_base)) { + first =3D remove_and_enqueue_same_base(timer, base, mode, tim, delta_ns); + } else { + /* Keep the ENQUEUED state in case it is queued */ + bool was_armed =3D remove_hrtimer(timer, base, HRTIMER_STATE_ENQUEUED); =20 - hrtimer_set_expires_range_ns(timer, tim, delta_ns); + hrtimer_set_expires_range_ns(timer, tim, delta_ns); =20 - /* Switch the timer base, if necessary: */ - if (!keep_base) { + /* Switch the timer base, if necessary: */ base =3D switch_hrtimer_base(timer, base, is_pinned); cpu_base =3D base->cpu_base; - } =20 - first =3D enqueue_hrtimer(timer, base, mode, was_armed); + first =3D enqueue_hrtimer(timer, base, mode, was_armed); + } =20 /* * If the hrtimer interrupt is running, then it will reevaluate the @@ -1432,7 +1446,7 @@ int hrtimer_try_to_cancel(struct hrtimer base =3D lock_hrtimer_base(timer, &flags); =20 if (!hrtimer_callback_running(timer)) { - ret =3D remove_hrtimer(timer, base, false, false); + ret =3D remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE); if (ret) trace_hrtimer_cancel(timer); } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BDF13A9632 for ; Tue, 24 Feb 2026 16:37:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951066; cv=none; b=iuTMCERpaHByGnONvcaXIGRcJALgo7idDZoLDW8QtmEHyst7BPXXzWveZTcgkdVA+LmA7tQ09szDSN1VCmwb0sbHAupc/98Yz9qY3JBippigEBMxjmprEJvUOtB4Q27LJ3zPiwNLkXSBFbq9DDkyGpz0WHdhHrWSQc852KneNZs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951066; c=relaxed/simple; bh=68Vpg3ZGR5L3eEErLLUo/DuvUKIgxn0HZiIcccYEMfo=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dW3QchbwzYUhCnD9FAgKpaSPGPMvj1AcxDyiPZfOwaXVnwqjrmVudLzDCqH7LkiHmL9NryNIZErp3ACgwJO9KBdesGACeWgcybqZmTkXvj8rPHsdmaXG4b2tg5BInWc5xoeUCqZinT8tImmco0XneerdQhpusAWcASvgeZoWxLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mK6kje38; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mK6kje38" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7B61C116D0; Tue, 24 Feb 2026 16:37:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951066; bh=68Vpg3ZGR5L3eEErLLUo/DuvUKIgxn0HZiIcccYEMfo=; h=Date:From:To:Cc:Subject:References:From; b=mK6kje38h6BfNZr3eQaeToLWMQhwATmCl2563JKk7ogwl5sC7wMmjq8RTnLMDa2eM fWE3GYohm1BQZXp7uI1rK1kks3BS/LI3KAYtg9Nr8AAO1YL/830NNGdgbi9Rx4z2IT oimgOEjmjGl7HUnux+rRxjIsAy7cz6THTO68EU43FFdjEHz7wyvVCxyGdchXmiya5D TZR80g/Mrk+Cfsfhx1JCIZ1SWsAVYS9oUpzZzupjLyUQVfWLdBH6J9RFTAaeRR9iNX Yhl6QBZVFTRM4qk1q8cUgJDUQNtmTMXsnsWFhRKe1gwUiWlxAb1iKPKCc6lOyrW2BJ TbgpuNUIhfmLA== Date: Tue, 24 Feb 2026 17:37:43 +0100 Message-ID: <20260224163430.803669745@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 31/48] hrtimer: Add hrtimer_rearm tracepoint References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Analyzing the reprogramming of the clock event device is essential to debug the behaviour of the hrtimer subsystem especially with the upcoming deferred rearming scheme. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/trace/events/timer.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -325,6 +325,30 @@ DEFINE_EVENT(hrtimer_class, hrtimer_canc ); =20 /** + * hrtimer_rearm - Invoked when the clockevent device is rearmed + * @next_event: The next expiry time (CLOCK_MONOTONIC) + */ +TRACE_EVENT(hrtimer_rearm, + + TP_PROTO(ktime_t next_event, bool deferred), + + TP_ARGS(next_event, deferred), + + TP_STRUCT__entry( + __field( s64, next_event ) + __field( bool, deferred ) + ), + + TP_fast_assign( + __entry->next_event =3D next_event; + __entry->deferred =3D deferred; + ), + + TP_printk("next_event=3D%llu deferred=3D%d", + (unsigned long long) __entry->next_event, __entry->deferred) +); + +/** * itimer_state - called when itimer is started or canceled * @which: name of the interval timer * @value: the itimers value, itimer is canceled if value->it_value is From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C0DF3A9017 for ; Tue, 24 Feb 2026 16:37:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951071; cv=none; b=VXBMPT0OMt5GBIEmIUetufLmoz8hPL0fOpjmukMntOOh8RMwN/Dy+5vtiUuh2Aop6fYc6pGesgGkys2MgyVWRkmElaFi+w4AyfBuiYQbey9pSFm8GtenJuxlhNznnOcFVWCYv/eQOtFg7yNhrP47GlV2seZOvWQ7za05bGKMC50= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951071; c=relaxed/simple; bh=msLMmZOey6NGztL4eHEzxtT9NsHXzhH83BCFBuKtKsA=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=g/e7GMseFkDbbMKtzegIClq2ASSh3qvr9g+cOy/p8WfjQvi5PkFN3mNr+ZRgZnyIdgiZmysFLmfAjg2K6Oc69PNfcwSDj6H1SQpC3DL/5pQVJrRyehzP7n4OIjH0l7eTW1YVMkLiUHka7lZb9+RLteqYbtg2kdSnOSQxSVO/29I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=V1ITk2tG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="V1ITk2tG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A784BC116D0; Tue, 24 Feb 2026 16:37:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951071; bh=msLMmZOey6NGztL4eHEzxtT9NsHXzhH83BCFBuKtKsA=; h=Date:From:To:Cc:Subject:References:From; b=V1ITk2tG3AwGOyHN9wOK2YVihn9nASxYgXkAmC2Nwor1P6GYhNh5MMq1laGBMOobR /M5QI6RftMjr46pUUD8jb386/fu97xHgmyvchEU6UbMLvi05VXHc1We2vmNWHjudZg ScKP0DGxFPdcMnDmSuOm8oXRySlXsuqabJcoP4y5EbmM5hajTzlAEMBTYEZBHjefD1 ipWjdNlernUIIWZUxboK5E9OWY2+Tr/JHyM6oOFiV4qXUx0o3O/uE/XCOIo2C12xKj 8E1ktV2FvPT1saX1ul8qlQHdjnYKTnxtF91lLV65aitbghVQd39tQ75Fo53ZOzQ+tA ajRWm5amPCKyQ== Date: Tue, 24 Feb 2026 17:37:48 +0100 Message-ID: <20260224163430.870639266@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 32/48] hrtimer: Re-arrange hrtimer_interrupt() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Rework hrtimer_interrupt() such that reprogramming is split out into an independent function at the end of the interrupt. This prepares for reprogramming getting delayed beyond the end of hrtimer_interrupt(). Notably, this changes the hang handling to always wait 100ms instead of trying to keep it proportional to the actual delay. This simplifies the state, also this really shouldn't be happening. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- tglx: Added the tracepoint and used a proper naming convention --- kernel/time/hrtimer.c | 93 +++++++++++++++++++++++----------------------= ----- 1 file changed, 44 insertions(+), 49 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -690,6 +690,12 @@ static inline int hrtimer_hres_active(st cpu_base->hres_active : 0; } =20 +static inline void hrtimer_rearm_event(ktime_t expires_next, bool deferred) +{ + trace_hrtimer_rearm(expires_next, deferred); + tick_program_event(expires_next, 1); +} + static void __hrtimer_reprogram(struct hrtimer_cpu_base *cpu_base, struct = hrtimer *next_timer, ktime_t expires_next) { @@ -715,7 +721,7 @@ static void __hrtimer_reprogram(struct h if (!hrtimer_hres_active(cpu_base) || cpu_base->hang_detected) return; =20 - tick_program_event(expires_next, 1); + hrtimer_rearm_event(expires_next, false); } =20 /* @@ -1939,6 +1945,28 @@ static __latent_entropy void hrtimer_run #ifdef CONFIG_HIGH_RES_TIMERS =20 /* + * Very similar to hrtimer_force_reprogram(), except it deals with + * in_hrtirq and hang_detected. + */ +static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) +{ + ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); + + cpu_base->expires_next =3D expires_next; + cpu_base->in_hrtirq =3D false; + + if (unlikely(cpu_base->hang_detected)) { + /* + * Give the system a chance to do something else than looping + * on hrtimer interrupts. + */ + expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); + cpu_base->hang_detected =3D false; + } + hrtimer_rearm_event(expires_next, false); +} + +/* * High resolution timer interrupt * Called with interrupts disabled */ @@ -1973,63 +2001,30 @@ void hrtimer_interrupt(struct clock_even =20 __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); =20 - /* Reevaluate the clock bases for the [soft] next expiry */ - expires_next =3D hrtimer_update_next_event(cpu_base); - /* - * Store the new expiry value so the migration code can verify - * against it. - */ - cpu_base->expires_next =3D expires_next; - cpu_base->in_hrtirq =3D false; - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); - - /* Reprogramming necessary ? */ - if (!tick_program_event(expires_next, 0)) { - cpu_base->hang_detected =3D false; - return; - } - /* * The next timer was already expired due to: * - tracing * - long lasting callbacks * - being scheduled away when running in a VM * - * We need to prevent that we loop forever in the hrtimer - * interrupt routine. We give it 3 attempts to avoid - * overreacting on some spurious event. - * - * Acquire base lock for updating the offsets and retrieving - * the current time. + * We need to prevent that we loop forever in the hrtiner interrupt + * routine. We give it 3 attempts to avoid overreacting on some + * spurious event. */ - raw_spin_lock_irqsave(&cpu_base->lock, flags); now =3D hrtimer_update_base(cpu_base); - cpu_base->nr_retries++; - if (++retries < 3) - goto retry; - /* - * Give the system a chance to do something else than looping - * here. We stored the entry time, so we know exactly how long - * we spent here. We schedule the next event this amount of - * time away. - */ - cpu_base->nr_hangs++; - cpu_base->hang_detected =3D true; - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); + expires_next =3D hrtimer_update_next_event(cpu_base); + if (expires_next < now) { + if (++retries < 3) + goto retry; =20 - delta =3D ktime_sub(now, entry_time); - if ((unsigned int)delta > cpu_base->max_hang_time) - cpu_base->max_hang_time =3D (unsigned int) delta; - /* - * Limit it to a sensible value as we enforce a longer - * delay. Give the CPU at least 100ms to catch up. - */ - if (delta > 100 * NSEC_PER_MSEC) - expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); - else - expires_next =3D ktime_add(now, delta); - tick_program_event(expires_next, 1); - pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta)); + delta =3D ktime_sub(now, entry_time); + cpu_base->max_hang_time =3D max_t(unsigned int, cpu_base->max_hang_time,= delta); + cpu_base->nr_hangs++; + cpu_base->hang_detected =3D true; + } + + hrtimer_rearm(cpu_base, now); + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } #endif /* !CONFIG_HIGH_RES_TIMERS */ From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56C2B3A962D for ; Tue, 24 Feb 2026 16:37:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951076; cv=none; b=bPEip9H7Rb25x3xXQz26j7M0raWQWPn9Z1Qnc38WAPa8Xf7qmCtXkWzHpMQrhU8LC4WXnTxzd5uuRa1YQYO7uWCQHGsIkQ1XzfYib3t/LUTcv2VrkAG6pO/BJLAu4fOtcGP91hJFzuG322XswfApeRsrAkfce87GZ3ALOKxCb+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951076; c=relaxed/simple; bh=n6xbwpCRaKUAePhEQ8JgtNmndqtwFcxqxuPSccbQ8YE=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=O1A2Ap/Fy1bePtm7yPXMTSSVp7sXwleq5oZfT0EptXgSvbO9aVf3dbHErKbErVh99ZCAbShP2K0aGeWHhQiqkB+xqtjmnGMQ1zUWhermUSDP8GOO5DnwIWWwQNQ9YsMMXAho/aw4wEEMGpmrCNx33Tz679Hi2wfxmtaB05EVaLM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aLtH1wuT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aLtH1wuT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2C61C116D0; Tue, 24 Feb 2026 16:37:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951076; bh=n6xbwpCRaKUAePhEQ8JgtNmndqtwFcxqxuPSccbQ8YE=; h=Date:From:To:Cc:Subject:References:From; b=aLtH1wuT1PkcmnCjRmX7G8eQGhGNpD3zvw84PgtLmBT740cYI0liapr/aOaZ1w4HB oxy3t8/x1CLkzT+RFhbYboCOPPHeI9lYzwqDBFEa73qCwpGV55Tn5ZPx9D/PTpocbv 6l3257DDSaTizHBc6DQOx4V5g1cWbiJdsRgbHl7b8XQjKbxO67FaTO2jhRR+vAhl1d Np8EipZCHS0ecxnm+itfjwmQSyn4tf2aKhZyRxvnc2R1CohdtK+KQVIP79bJMHl8YB FUz30iU7R++BP7zm8kRq4OOQ+MSFhTqUpqk7LuLowIfMgDQi5OL63bsqV7TjgwYV0R zzzC4qk0c3Rjg== Date: Tue, 24 Feb 2026 17:37:53 +0100 Message-ID: <20260224163430.935623347@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 33/48] hrtimer: Rename hrtimer_cpu_base::in_hrtirq to deferred_rearm References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The upcoming deferred rearming scheme has the same effect as the deferred rearming when the hrtimer interrupt is executing. So it can reuse the in_hrtirq flag, but when it gets deferred beyond the hrtimer interrupt path, then the name does not make sense anymore. Rename it to deferred_rearm upfront to keep the actual functional change separate from the mechanical rename churn. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer_defs.h | 4 ++-- kernel/time/hrtimer.c | 28 +++++++++------------------- 2 files changed, 11 insertions(+), 21 deletions(-) --- a/include/linux/hrtimer_defs.h +++ b/include/linux/hrtimer_defs.h @@ -53,7 +53,7 @@ enum hrtimer_base_type { * @active_bases: Bitfield to mark bases with active timers * @clock_was_set_seq: Sequence counter of clock was set events * @hres_active: State of high resolution mode - * @in_hrtirq: hrtimer_interrupt() is currently executing + * @deferred_rearm: A deferred rearm is pending * @hang_detected: The last hrtimer interrupt detected a hang * @softirq_activated: displays, if the softirq is raised - update of soft= irq * related settings is not required then. @@ -84,7 +84,7 @@ struct hrtimer_cpu_base { unsigned int active_bases; unsigned int clock_was_set_seq; bool hres_active; - bool in_hrtirq; + bool deferred_rearm; bool hang_detected; bool softirq_activated; bool online; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -883,11 +883,8 @@ static void hrtimer_reprogram(struct hrt if (expires >=3D cpu_base->expires_next) return; =20 - /* - * If the hrtimer interrupt is running, then it will reevaluate the - * clock bases and reprogram the clock event device. - */ - if (cpu_base->in_hrtirq) + /* If a deferred rearm is pending skip reprogramming the device */ + if (cpu_base->deferred_rearm) return; =20 cpu_base->next_timer =3D timer; @@ -921,12 +918,8 @@ static bool update_needs_ipi(struct hrti if (seq =3D=3D cpu_base->clock_was_set_seq) return false; =20 - /* - * If the remote CPU is currently handling an hrtimer interrupt, it - * will reevaluate the first expiring timer of all clock bases - * before reprogramming. Nothing to do here. - */ - if (cpu_base->in_hrtirq) + /* If a deferred rearm is pending the remote CPU will take care of it */ + if (cpu_base->deferred_rearm) return false; =20 /* @@ -1334,11 +1327,8 @@ static bool __hrtimer_start_range_ns(str first =3D enqueue_hrtimer(timer, base, mode, was_armed); } =20 - /* - * If the hrtimer interrupt is running, then it will reevaluate the - * clock bases and reprogram the clock event device. - */ - if (cpu_base->in_hrtirq) + /* If a deferred rearm is pending skip reprogramming the device */ + if (cpu_base->deferred_rearm) return false; =20 if (!was_first || cpu_base !=3D this_cpu_base) { @@ -1946,14 +1936,14 @@ static __latent_entropy void hrtimer_run =20 /* * Very similar to hrtimer_force_reprogram(), except it deals with - * in_hrtirq and hang_detected. + * deferred_rearm and hang_detected. */ static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) { ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); =20 cpu_base->expires_next =3D expires_next; - cpu_base->in_hrtirq =3D false; + cpu_base->deferred_rearm =3D false; =20 if (unlikely(cpu_base->hang_detected)) { /* @@ -1984,7 +1974,7 @@ void hrtimer_interrupt(struct clock_even raw_spin_lock_irqsave(&cpu_base->lock, flags); entry_time =3D now =3D hrtimer_update_base(cpu_base); retry: - cpu_base->in_hrtirq =3D true; + cpu_base->deferred_rearm =3D true; /* * Set expires_next to KTIME_MAX, which prevents that remote CPUs queue * timers while __hrtimer_run_queues() is expiring the clock bases. From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C64723A7840 for ; Tue, 24 Feb 2026 16:38:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951081; cv=none; b=cGaVSjeYYAMWEeQPoxrSpwl4cJyryTb21p4R/MU6fGtf+wSgpAvcGLwblseSGU5p++8i8w9MG2IIPevLRhJt61WLlkf0fI5dthM3g+UqdvobMjdtBllfr3tKW8ajuHkHPiiEip9Iq+ReiWW2mxrAeSqqtyX+ffXTBv5MBXG8NjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951081; c=relaxed/simple; bh=RZl68DY8fdYAWHtGAsOCw4dlq5rNslnt+7CvrsCiDHs=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=tL6VmZ7l/b87RSK7peUNVN6+yt00hOBzl74VR5/sYezNPy5eudxm3VnWzM2yueuYyLxsp+e/9H4VRvCEj7Yj7YCtRcbL4qHYCpN2jc28tDA63RgiIUASNOR6qgL1lKqc419Cy4aniTot2OkYgrJPbbJuAurKm2rXEMWwPVwbFko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=I7IpcWUK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I7IpcWUK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0966C116D0; Tue, 24 Feb 2026 16:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951081; bh=RZl68DY8fdYAWHtGAsOCw4dlq5rNslnt+7CvrsCiDHs=; h=Date:From:To:Cc:Subject:References:From; b=I7IpcWUKcl9nFJWAY24o2ZBPvwb1bA3SLRpJg1HRnPP5oT6uwds7vHW/+hwcEZnvO ILKN52jATxHrT5NrKMEEJv0HRYoWcdAbGen3d6HU6Ht8eUDgoqejsautXmfIFN7VcB NK0nYTKKRloCpIxgmFDjkDkZ5lXxHOWmHzLY1MDYhUJ59Wod35uYrLuH2ZBZfMK6Zb mQVv+TgWU+6w0yIyXBlzXdyU+BeRuQ2MIc8SxoNtW/eYv4azWXLRnvXNv4BUV73Vdi zJy/7WRPwvsjV9WJ/44LM0t2dEBIukwvXAYAoElKkT9l7mC0olnfHIR23OF6YWFBOS HeXg6mCOuXLCA== Date: Tue, 24 Feb 2026 17:37:58 +0100 Message-ID: <20260224163431.000891171@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 34/48] hrtimer: Prepare stubs for deferred rearming References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer set NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(). To make this correct the affected code parts need to be made aware of this. Provide empty stubs for the deferred rearming mechanism, so that the relevant code changes for entry, softirq and scheduler can be split up into separate changes independent of the actual enablement in the hrtimer code. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- tglx: Split out to make it simpler to review and to make cross subsystem merge logistics trivial. --- include/linux/hrtimer.h | 1 + include/linux/hrtimer_rearm.h | 21 +++++++++++++++++++++ kernel/time/Kconfig | 4 ++++ 3 files changed, 26 insertions(+) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -13,6 +13,7 @@ #define _LINUX_HRTIMER_H =20 #include +#include #include #include #include --- /dev/null +++ b/include/linux/hrtimer_rearm.h @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _LINUX_HRTIMER_REARM_H +#define _LINUX_HRTIMER_REARM_H + +#ifdef CONFIG_HRTIMER_REARM_DEFERRED +static __always_inline void __hrtimer_rearm_deferred(void) { } +static __always_inline void hrtimer_rearm_deferred(void) { } +static __always_inline void hrtimer_rearm_deferred_tif(unsigned long tif_w= ork) { } +static __always_inline bool +hrtimer_rearm_deferred_user_irq(unsigned long *tif_work, const unsigned lo= ng tif_mask) { return false; } +static __always_inline bool hrtimer_test_and_clear_rearm_deferred(void) { = return false; } +#else /* CONFIG_HRTIMER_REARM_DEFERRED */ +static __always_inline void __hrtimer_rearm_deferred(void) { } +static __always_inline void hrtimer_rearm_deferred(void) { } +static __always_inline void hrtimer_rearm_deferred_tif(unsigned long tif_w= ork) { } +static __always_inline bool +hrtimer_rearm_deferred_user_irq(unsigned long *tif_work, const unsigned lo= ng tif_mask) { return false; } +static __always_inline bool hrtimer_test_and_clear_rearm_deferred(void) { = return false; } +#endif /* !CONFIG_HRTIMER_REARM_DEFERRED */ + +#endif --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -58,6 +58,10 @@ config GENERIC_CLOCKEVENTS_COUPLED_INLIN config GENERIC_CMOS_UPDATE bool =20 +# Deferred rearming of the hrtimer interrupt +config HRTIMER_REARM_DEFERRED + def_bool n + # Select to handle posix CPU timers from task_work # and not from the timer interrupt context config HAVE_POSIX_CPU_TIMERS_TASK_WORK From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 788E03A7840 for ; Tue, 24 Feb 2026 16:38:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951086; cv=none; b=MMGqIoOFjDB+vI33F0YyLaXKcPpL3140ekdoYvPOfPHloQ+a5PIY3K2e1AaPih3/WwnIkwlK8EM77ZSz4YlP8yL2UZxU++FLwGsPt7bebwx6sGZiERt7W0OIm+ijJ+VVYCPhwyH5m3sjME9cc+I/gaBSSTbhUT85M/x3jgcqDHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951086; c=relaxed/simple; bh=54A0GSkjTxNpHbEBNcRrGO7zLNFt0kL8xbREJqd3nb4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=aNAKRaq9g9Do+lUcyzh7TwvVcVgPIW4JRuWwVa0p+AjQ/1qubfp6uA9XKLELhdOzzA5uAyJyFAmJ/4d4Q+SpY7mZyjy+s65L5vOvbsDCroUDOZ7VLW3H0WPIaICULAC8V+wvgYw9xoMuNTpUmvy3uSGKxxk1gByPtz71g1Sj6xc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=T3UJAfql; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="T3UJAfql" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A08CC19425; Tue, 24 Feb 2026 16:38:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951086; bh=54A0GSkjTxNpHbEBNcRrGO7zLNFt0kL8xbREJqd3nb4=; h=Date:From:To:Cc:Subject:References:From; b=T3UJAfqlVic2YLbYH1qdhMmcx/ZB7UVqzd5p1CM0x8BgqxJVuqGhwgWbdbPnppQP1 x6NSwSi1Va86IYxbFtvxvqk4unnKjK1hDjj/JMGalYSU5VLoJh/gFLGuyuRKtKKLHE vnG6YUu0doqsFE88phZoBZblIMIbI9e0djrTNrN58KrgiwD3j4lYzj+2irMWCTjNEy HKZ9QhOHwZB3d5+6Pori42Untp045+o7s0btQ4gvC+5BtcGT86H3FUGJsg4eDxlvtX JxMmjpnaw3/3lCXjkMMFfvFJ5R3JS5659VH1KKaR1tLDmw+hgEhGOTcxj3gOuCCnDj 0LuenUhoffqkQ== Date: Tue, 24 Feb 2026 17:38:03 +0100 Message-ID: <20260224163431.066469985@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 35/48] entry: Prepare for deferred hrtimer rearming References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer sets NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(), which avoids multiple rearms and re-evaluation of the timer wheel. As this is only relevant for interrupt to user return split the work masks up and hand them in as arguments from the relevant exit to user functions, which allows the compiler to optimize the deferred handling out for the syscall exit to user case. Add the rearm checks to the approritate places in the exit to user loop and the interrupt return to kernel path, so that the rearming is always guaranteed. In the return to user space path this is handled in the same way as TIF_RSEQ to avoid extra instructions in the fast path, which are truly hurtful for device interrupt heavy work loads as the extra instructions and conditionals while benign at first sight accumulate quickly into measurable regressions. The return from syscall path is completely unaffected due to the above mentioned split so syscall heavy workloads wont have any extra burden. For now this is just placing empty stubs at the right places which are all optimized out by the compiler until the actual functionality is in place. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- tglx: Split out to make it simpler to review and to make cross subsystem merge logistics trivial. --- include/linux/irq-entry-common.h | 25 +++++++++++++++++++------ include/linux/rseq_entry.h | 16 +++++++++++++--- kernel/entry/common.c | 4 +++- 3 files changed, 35 insertions(+), 10 deletions(-) --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -3,6 +3,7 @@ #define __LINUX_IRQENTRYCOMMON_H =20 #include +#include #include #include #include @@ -33,6 +34,14 @@ _TIF_PATCH_PENDING | _TIF_NOTIFY_SIGNAL | _TIF_RSEQ | \ ARCH_EXIT_TO_USER_MODE_WORK) =20 +#ifdef CONFIG_HRTIMER_REARM_DEFERRED +# define EXIT_TO_USER_MODE_WORK_SYSCALL (EXIT_TO_USER_MODE_WORK) +# define EXIT_TO_USER_MODE_WORK_IRQ (EXIT_TO_USER_MODE_WORK | _TIF_HRTIMER= _REARM) +#else +# define EXIT_TO_USER_MODE_WORK_SYSCALL (EXIT_TO_USER_MODE_WORK) +# define EXIT_TO_USER_MODE_WORK_IRQ (EXIT_TO_USER_MODE_WORK) +#endif + /** * arch_enter_from_user_mode - Architecture specific sanity check for user= mode regs * @regs: Pointer to currents pt_regs @@ -203,6 +212,7 @@ unsigned long exit_to_user_mode_loop(str /** * __exit_to_user_mode_prepare - call exit_to_user_mode_loop() if required * @regs: Pointer to pt_regs on entry stack + * @work_mask: Which TIF bits need to be evaluated * * 1) check that interrupts are disabled * 2) call tick_nohz_user_enter_prepare() @@ -212,7 +222,8 @@ unsigned long exit_to_user_mode_loop(str * * Don't invoke directly, use the syscall/irqentry_ prefixed variants below */ -static __always_inline void __exit_to_user_mode_prepare(struct pt_regs *re= gs) +static __always_inline void __exit_to_user_mode_prepare(struct pt_regs *re= gs, + const unsigned long work_mask) { unsigned long ti_work; =20 @@ -222,8 +233,10 @@ static __always_inline void __exit_to_us tick_nohz_user_enter_prepare(); =20 ti_work =3D read_thread_flags(); - if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) - ti_work =3D exit_to_user_mode_loop(regs, ti_work); + if (unlikely(ti_work & work_mask)) { + if (!hrtimer_rearm_deferred_user_irq(&ti_work, work_mask)) + ti_work =3D exit_to_user_mode_loop(regs, ti_work); + } =20 arch_exit_to_user_mode_prepare(regs, ti_work); } @@ -239,7 +252,7 @@ static __always_inline void __exit_to_us /* Temporary workaround to keep ARM64 alive */ static __always_inline void exit_to_user_mode_prepare_legacy(struct pt_reg= s *regs) { - __exit_to_user_mode_prepare(regs); + __exit_to_user_mode_prepare(regs, EXIT_TO_USER_MODE_WORK); rseq_exit_to_user_mode_legacy(); __exit_to_user_mode_validate(); } @@ -253,7 +266,7 @@ static __always_inline void exit_to_user */ static __always_inline void syscall_exit_to_user_mode_prepare(struct pt_re= gs *regs) { - __exit_to_user_mode_prepare(regs); + __exit_to_user_mode_prepare(regs, EXIT_TO_USER_MODE_WORK_SYSCALL); rseq_syscall_exit_to_user_mode(); __exit_to_user_mode_validate(); } @@ -267,7 +280,7 @@ static __always_inline void syscall_exit */ static __always_inline void irqentry_exit_to_user_mode_prepare(struct pt_r= egs *regs) { - __exit_to_user_mode_prepare(regs); + __exit_to_user_mode_prepare(regs, EXIT_TO_USER_MODE_WORK_IRQ); rseq_irqentry_exit_to_user_mode(); __exit_to_user_mode_validate(); } --- a/include/linux/rseq_entry.h +++ b/include/linux/rseq_entry.h @@ -40,6 +40,7 @@ DECLARE_PER_CPU(struct rseq_stats, rseq_ #endif /* !CONFIG_RSEQ_STATS */ =20 #ifdef CONFIG_RSEQ +#include #include #include #include @@ -110,7 +111,7 @@ static __always_inline void rseq_slice_c t->rseq.slice.state.granted =3D false; } =20 -static __always_inline bool rseq_grant_slice_extension(bool work_pending) +static __always_inline bool __rseq_grant_slice_extension(bool work_pending) { struct task_struct *curr =3D current; struct rseq_slice_ctrl usr_ctrl; @@ -215,11 +216,20 @@ static __always_inline bool rseq_grant_s return false; } =20 +static __always_inline bool rseq_grant_slice_extension(unsigned long ti_wo= rk, unsigned long mask) +{ + if (unlikely(__rseq_grant_slice_extension(ti_work & mask))) { + hrtimer_rearm_deferred_tif(ti_work); + return true; + } + return false; +} + #else /* CONFIG_RSEQ_SLICE_EXTENSION */ static inline bool rseq_slice_extension_enabled(void) { return false; } static inline bool rseq_arm_slice_extension_timer(void) { return false; } static inline void rseq_slice_clear_grant(struct task_struct *t) { } -static inline bool rseq_grant_slice_extension(bool work_pending) { return = false; } +static inline bool rseq_grant_slice_extension(unsigned long ti_work, unsig= ned long mask) { return false; } #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ =20 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs= , unsigned long csaddr); @@ -778,7 +788,7 @@ static inline void rseq_syscall_exit_to_ static inline void rseq_irqentry_exit_to_user_mode(void) { } static inline void rseq_exit_to_user_mode_legacy(void) { } static inline void rseq_debug_syscall_return(struct pt_regs *regs) { } -static inline bool rseq_grant_slice_extension(bool work_pending) { return = false; } +static inline bool rseq_grant_slice_extension(unsigned long ti_work, unsig= ned long mask) { return false; } #endif /* !CONFIG_RSEQ */ =20 #endif /* _LINUX_RSEQ_ENTRY_H */ --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -50,7 +50,7 @@ static __always_inline unsigned long __e local_irq_enable_exit_to_user(ti_work); =20 if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) { - if (!rseq_grant_slice_extension(ti_work & TIF_SLICE_EXT_DENY)) + if (!rseq_grant_slice_extension(ti_work, TIF_SLICE_EXT_DENY)) schedule(); } =20 @@ -225,6 +225,7 @@ noinstr void irqentry_exit(struct pt_reg */ if (state.exit_rcu) { instrumentation_begin(); + hrtimer_rearm_deferred(); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); @@ -238,6 +239,7 @@ noinstr void irqentry_exit(struct pt_reg if (IS_ENABLED(CONFIG_PREEMPTION)) irqentry_exit_cond_resched(); =20 + hrtimer_rearm_deferred(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 196AB3A7840 for ; Tue, 24 Feb 2026 16:38:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951091; cv=none; b=iLRgamfkhYJ1A4v/Nf3AhFT0ALm1fRdyp+aWo0spT8iD34fxPhvFAjU+rkIJ6yNaF22hv/eWoYLvs1cGun7wjPTKklLQC6JuqtH1E1CQkV1NnVqcXGQhL3u+VsLt4omNc8br95HQSwQLq93KyrTvN0r+6W7Rl8aNZC33AeAyYrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951091; c=relaxed/simple; bh=uCfOfxDrlhrBdfs7rZilyipLgOf8PozhpgbZP530OFM=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=LDgZUpJZLxWkVjtkEwwCaYGUBMU2YQV+Fj3ERjVCTywz9978ayene7K3/C+jYWwbZ95XR/gwOK1MMK5YWxBG8n8M5avF5XChpLc8dhwNlQzzKNsRDqGV+IoBeyxkUfASr5m9v5evQdMWqvK3KlVhNCEpbjlqtF8TqA/k1aBfXj8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ubr++tDY; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ubr++tDY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 696D0C116D0; Tue, 24 Feb 2026 16:38:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951091; bh=uCfOfxDrlhrBdfs7rZilyipLgOf8PozhpgbZP530OFM=; h=Date:From:To:Cc:Subject:References:From; b=Ubr++tDYuWMqdbgBWjFGKxWGD3D0vkGwg9k5m0i8o6BlUoGIjvb+Vbo4cWcyVnsIN sRx0+8TwFGVKLWuJKG27HX0YA5ymet7CsanUmOoG68MFypE933XgLq1TXLPQcVhZLq GboTDf/LmIhqdjL1jd5z1g2zicJd1+6vOm1/cHcoeaAh6N21MJjOkUE2NeFO/zkO1i L9EtQ9CVA4IkcCUmNT9fFujbyg3sz49ft4mdqFeHUC7eDCWHqjAOQ8sEdRIQgmX8HS 5S1YGUiGAHRxJXanR33G/b45XRCWndfwcvV6o0tuHnlA8WGOdWrzSMWFBMMx34FsY8 RrGbcfhQvq8eA== Date: Tue, 24 Feb 2026 17:38:07 +0100 Message-ID: <20260224163431.142854488@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 36/48] softirq: Prepare for deferred hrtimer rearming References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer sets NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(), which avoids multiple rearms and re-evaluation of the timer wheel. In case that the return from interrupt ends up handling softirqs before reaching the rearm conditions in the return to user entry code functions, a deferred rearm has to be handled before softirq handling enables interrupts as soft interrupt handling can be long and would therefore introduce hard to diagnose latencies to the timer interrupt. Place the for now empty stub call right before invoking the softirq handling routine. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- tglx: Split out to make it simpler to review and to make cross subsystem merge logistics trivial. --- kernel/softirq.c | 15 ++++++++++++++- kernel/softirq.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -663,6 +663,13 @@ void irq_enter_rcu(void) { __irq_enter_raw(); =20 + /* + * If this is a nested interrupt that hits the exit_to_user_mode_loop + * where it has enabled interrupts but before it has hit schedule() we + * could have hrtimers in an undefined state. Fix it up here. + */ + hrtimer_rearm_deferred(); + if (tick_nohz_full_cpu(smp_processor_id()) || (is_idle_task(current) && (irq_count() =3D=3D HARDIRQ_OFFSET))) tick_irq_enter(); @@ -719,8 +726,14 @@ static inline void __irq_exit_rcu(void) #endif account_hardirq_exit(current); preempt_count_sub(HARDIRQ_OFFSET); - if (!in_interrupt() && local_softirq_pending()) + if (!in_interrupt() && local_softirq_pending()) { + /* + * If we left hrtimers unarmed, make sure to arm them now, + * before enabling interrupts to run SoftIRQ. + */ + hrtimer_rearm_deferred(); invoke_softirq(); + } =20 if (IS_ENABLED(CONFIG_IRQ_FORCED_THREADING) && force_irqthreads() && local_timers_pending_force_th() && !(in_nmi() | in_hardirq())) From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 512B03A7840 for ; Tue, 24 Feb 2026 16:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951096; cv=none; b=mIEkcXrwbr2HxxrQDCQjRlIqbHs/6oO+7rhWLazgZMPAsJc6euVYkR6R4qcbGRlm7AhbILVr/Z28yXBGZBnS62jAT0ohXtAhPK7iLlMlqO7KxNkHYSgxHQvz85LMzi1yqKhosNlUpJCLh3okvk4qaVcPqhJSc00BVTVVp/e6L1U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951096; c=relaxed/simple; bh=t4mGNYQHdhq1LJBnthcBWbTC1Z9MRWQgDUnrnUXZRUw=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nQYKZZ1PGYVgJ416qFcpKA1nHq1CD6UecuWBs2YFGKVBP+yrKEJ349LPridENMTLYzYMEmr/8eGhhA3hIOjIVkU2u1U+0uUq2NtKwneKmJ9yZpSWuJNkxmNQZPbExqv0SapCKVJPnS70/D07nBA2dpDNksQvHyPswBNuxdXXywk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=n79bNwuN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="n79bNwuN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57641C116D0; Tue, 24 Feb 2026 16:38:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951096; bh=t4mGNYQHdhq1LJBnthcBWbTC1Z9MRWQgDUnrnUXZRUw=; h=Date:From:To:Cc:Subject:References:From; b=n79bNwuNHL4RHNbLhuIDA+RmZmz/oZonx6crd8ntKAz+8lRZyMV3Q5E7/iVydhD8f POa7RX3wRPxZhaVzS1e2RZ8+cwAKi/HEhsMjZ6UDOZaIiWeeuTj9duAdNeNdlkNv0W aiROkbewKDV9g7RYY6x4rZwxrz8ADbU6lLqtSp+IvVCyrxaCtvjLLn3fn9+CrjZqgs 5FtRzHh1VnwpV92OiMo3BaihM7rQqpLpoUV8sTJOR7tfOq4xTve5h3YEPjvFRItOif TRn+Dn19xBepSBgQDhcEiCIZ+lkAWEH5pYt97k2M5HrO0ap4XhI0cwFr5NNqBIe9MP 4ne83PUyAdxtg== Date: Tue, 24 Feb 2026 17:38:12 +0100 Message-ID: <20260224163431.208580085@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 37/48] sched/core: Prepare for deferred hrtimer rearming References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The hrtimer interrupt expires timers and at the end of the interrupt it rearms the clockevent device for the next expiring timer. That's obviously correct, but in the case that a expired timer sets NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is enabled then schedule() will modify the hrtick timer, which causes another reprogramming of the hardware. That can be avoided by deferring the rearming to the return from interrupt path and if the return results in a immediate schedule() invocation then it can be deferred until the end of schedule(), which avoids multiple rearms and re-evaluation of the timer wheel. Add the rearm checks to the existing sched_hrtick_enter/exit() functions, which already handle the batched rearm of the hrtick timer. For now this is just placing empty stubs at the right places which are all optimized out by the compiler until the guard condition becomes true. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- tglx: Split out to make it simpler to review and to make cross subsystem merge logistics trivial. --- kernel/sched/core.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -876,6 +876,7 @@ enum { HRTICK_SCHED_NONE =3D 0, HRTICK_SCHED_DEFER =3D BIT(1), HRTICK_SCHED_START =3D BIT(2), + HRTICK_SCHED_REARM_HRTIMER =3D BIT(3) }; =20 static void hrtick_clear(struct rq *rq) @@ -974,6 +975,8 @@ void hrtick_start(struct rq *rq, u64 del static inline void hrtick_schedule_enter(struct rq *rq) { rq->hrtick_sched =3D HRTICK_SCHED_DEFER; + if (hrtimer_test_and_clear_rearm_deferred()) + rq->hrtick_sched |=3D HRTICK_SCHED_REARM_HRTIMER; } =20 static inline void hrtick_schedule_exit(struct rq *rq) @@ -991,6 +994,9 @@ static inline void hrtick_schedule_exit( hrtimer_cancel(&rq->hrtick_timer); } =20 + if (rq->hrtick_sched & HRTICK_SCHED_REARM_HRTIMER) + __hrtimer_rearm_deferred(); + rq->hrtick_sched =3D HRTICK_SCHED_NONE; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D16513A7F69 for ; Tue, 24 Feb 2026 16:38:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951101; cv=none; b=CvLRpD5ZLOcc5Z/i14J8oMyOQoSw9wpPMNnYWpZsXFHpf8mIPecclyP2llWg8juETiCr47kjeDsN4c0yLquCbeeRhJLD6rJOlCGo/LJM3/XCdIKpqkPoTqx3zHtn4Rt1rTVB7P4lJJbGtMQjE04Ox3HuVP6A8Eq83BAqHbyze3I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951101; c=relaxed/simple; bh=FyCnFDMTTylYswvBGG0XON0jnixMnxtkKoDqAdI944w=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=MG0KeCTAb0cG5Ttd4ZAinqPfwT/tyuEcKAjQy15zWEek80BnWw584akpUiXYR7PZpGSBQKN/fThkETP40/f7cRqyak5frMaTMSfbjcguMgmjInKBir7pIjowDdF64ic3u5A2uluKeGnM8eW5NBVyuN84mnW6tu1W2wU1biovJ9o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hC8oOiMu; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hC8oOiMu" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A0F63C116D0; Tue, 24 Feb 2026 16:38:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951101; bh=FyCnFDMTTylYswvBGG0XON0jnixMnxtkKoDqAdI944w=; h=Date:From:To:Cc:Subject:References:From; b=hC8oOiMusFe78SXO7EwoY2RargDxhll+I6e5dI6GuDACYAjACMy5V5epBU7g0Yyhs Nv2hhHL1wbNrkCD8bQujAFFDSsz0/MiwMpCZmkwPf4CW/62i8cAs/yUZANfkSrL80I NIIuAOsyF6uD1jS/wO0WtO2mXiYi+nhcitjOmRUn7XRujXZ+giz7crqFMeBj56iN9Q mHw5wu5K4WY+gkNIBLalyuBLhvOvakl5yFWagJR0SN7qHVdrkpxzZ9vr7Hj4UbEEeh QWpCoD2qzIMtfG1L8uyl2tWgydJCjT3s1pRH2FSI0Yr8iMsfPk2pmYHq1BpqqdJUhj rk709Q0GKCIqA== Date: Tue, 24 Feb 2026 17:38:18 +0100 Message-ID: <20260224163431.273488269@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 38/48] hrtimer: Push reprogramming timers into the interrupt return path References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Currently hrtimer_interrupt() runs expired timers, which can re-arm themselves, after which it computes the next expiration time and re-programs the hardware. However, things like HRTICK, a highres timer driving preemption, cannot re-arm itself at the point of running, since the next task has not been determined yet. The schedule() in the interrupt return path will switch to the next task, which then causes a new hrtimer to be programmed. This then results in reprogramming the hardware at least twice, once after running the timers, and once upon selecting the new task. Notably, *both* events happen in the interrupt. By pushing the hrtimer reprogram all the way into the interrupt return path, it runs after schedule() picks the new task and the double reprogram can be avoided. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/asm-generic/thread_info_tif.h | 5 +- include/linux/hrtimer_rearm.h | 72 +++++++++++++++++++++++++++++= ++--- kernel/time/Kconfig | 4 + kernel/time/hrtimer.c | 38 +++++++++++++++-- 4 files changed, 107 insertions(+), 12 deletions(-) --- a/include/asm-generic/thread_info_tif.h +++ b/include/asm-generic/thread_info_tif.h @@ -41,11 +41,14 @@ #define _TIF_PATCH_PENDING BIT(TIF_PATCH_PENDING) =20 #ifdef HAVE_TIF_RESTORE_SIGMASK -# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() */ +# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() # define _TIF_RESTORE_SIGMASK BIT(TIF_RESTORE_SIGMASK) #endif =20 #define TIF_RSEQ 11 // Run RSEQ fast path #define _TIF_RSEQ BIT(TIF_RSEQ) =20 +#define TIF_HRTIMER_REARM 12 // re-arm the timer +#define _TIF_HRTIMER_REARM BIT(TIF_HRTIMER_REARM) + #endif /* _ASM_GENERIC_THREAD_INFO_TIF_H_ */ --- a/include/linux/hrtimer_rearm.h +++ b/include/linux/hrtimer_rearm.h @@ -3,12 +3,74 @@ #define _LINUX_HRTIMER_REARM_H =20 #ifdef CONFIG_HRTIMER_REARM_DEFERRED -static __always_inline void __hrtimer_rearm_deferred(void) { } -static __always_inline void hrtimer_rearm_deferred(void) { } -static __always_inline void hrtimer_rearm_deferred_tif(unsigned long tif_w= ork) { } +#include + +void __hrtimer_rearm_deferred(void); + +/* + * This is purely CPU local, so check the TIF bit first to avoid the overh= ead of + * the atomic test_and_clear_bit() operation for the common case where the= bit + * is not set. + */ +static __always_inline bool hrtimer_test_and_clear_rearm_deferred_tif(unsi= gned long tif_work) +{ + lockdep_assert_irqs_disabled(); + + if (unlikely(tif_work & _TIF_HRTIMER_REARM)) { + clear_thread_flag(TIF_HRTIMER_REARM); + return true; + } + return false; +} + +#define TIF_REARM_MASK (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY | _TIF_= HRTIMER_REARM) + +/* Invoked from the exit to user before invoking exit_to_user_mode_loop() = */ static __always_inline bool -hrtimer_rearm_deferred_user_irq(unsigned long *tif_work, const unsigned lo= ng tif_mask) { return false; } -static __always_inline bool hrtimer_test_and_clear_rearm_deferred(void) { = return false; } +hrtimer_rearm_deferred_user_irq(unsigned long *tif_work, const unsigned lo= ng tif_mask) +{ + /* Help the compiler to optimize the function out for syscall returns */ + if (!(tif_mask & _TIF_HRTIMER_REARM)) + return false; + /* + * Rearm the timer if none of the resched flags is set before going into + * the loop which re-enables interrupts. + */ + if (unlikely((*tif_work & TIF_REARM_MASK) =3D=3D _TIF_HRTIMER_REARM)) { + clear_thread_flag(TIF_HRTIMER_REARM); + __hrtimer_rearm_deferred(); + /* Don't go into the loop if HRTIMER_REARM was the only flag */ + *tif_work &=3D ~TIF_HRTIMER_REARM; + return !*tif_work; + } + return false; +} + +/* Invoked from the time slice extension decision function */ +static __always_inline void hrtimer_rearm_deferred_tif(unsigned long tif_w= ork) +{ + if (hrtimer_test_and_clear_rearm_deferred_tif(tif_work)) + __hrtimer_rearm_deferred(); +} + +/* + * This is to be called on all irqentry_exit() paths that will enable + * interrupts. + */ +static __always_inline void hrtimer_rearm_deferred(void) +{ + hrtimer_rearm_deferred_tif(read_thread_flags()); +} + +/* + * Invoked from the scheduler on entry to __schedule() so it can defer + * rearming after the load balancing callbacks which might change hrtick. + */ +static __always_inline bool hrtimer_test_and_clear_rearm_deferred(void) +{ + return hrtimer_test_and_clear_rearm_deferred_tif(read_thread_flags()); +} + #else /* CONFIG_HRTIMER_REARM_DEFERRED */ static __always_inline void __hrtimer_rearm_deferred(void) { } static __always_inline void hrtimer_rearm_deferred(void) { } --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -60,7 +60,9 @@ config GENERIC_CMOS_UPDATE =20 # Deferred rearming of the hrtimer interrupt config HRTIMER_REARM_DEFERRED - def_bool n + def_bool y + depends on GENERIC_ENTRY && HAVE_GENERIC_TIF_BITS + depends on HIGH_RES_TIMERS && SCHED_HRTICK =20 # Select to handle posix CPU timers from task_work # and not from the timer interrupt context --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1939,10 +1939,9 @@ static __latent_entropy void hrtimer_run * Very similar to hrtimer_force_reprogram(), except it deals with * deferred_rearm and hang_detected. */ -static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) +static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, + ktime_t expires_next, bool deferred) { - ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); - cpu_base->expires_next =3D expires_next; cpu_base->deferred_rearm =3D false; =20 @@ -1954,9 +1953,37 @@ static void hrtimer_rearm(struct hrtimer expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); cpu_base->hang_detected =3D false; } - hrtimer_rearm_event(expires_next, false); + hrtimer_rearm_event(expires_next, deferred); } =20 +#ifdef CONFIG_HRTIMER_REARM_DEFERRED +void __hrtimer_rearm_deferred(void) +{ + struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); + ktime_t now, expires_next; + + if (!cpu_base->deferred_rearm) + return; + + guard(raw_spinlock)(&cpu_base->lock); + now =3D hrtimer_update_base(cpu_base); + expires_next =3D hrtimer_update_next_event(cpu_base); + hrtimer_rearm(cpu_base, now, expires_next, true); +} + +static __always_inline void +hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, kt= ime_t expires_next) +{ + set_thread_flag(TIF_HRTIMER_REARM); +} +#else /* CONFIG_HRTIMER_REARM_DEFERRED */ +static __always_inline void +hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, kt= ime_t expires_next) +{ + hrtimer_rearm(cpu_base, now, expires_next, false); +} +#endif /* !CONFIG_HRTIMER_REARM_DEFERRED */ + /* * High resolution timer interrupt * Called with interrupts disabled @@ -2014,9 +2041,10 @@ void hrtimer_interrupt(struct clock_even cpu_base->hang_detected =3D true; } =20 - hrtimer_rearm(cpu_base, now); + hrtimer_interrupt_rearm(cpu_base, now, expires_next); raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } + #endif /* !CONFIG_HIGH_RES_TIMERS */ =20 /* From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5805C3A7F77 for ; Tue, 24 Feb 2026 16:38:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951106; cv=none; b=V6QuivRUTyCmJOkg39SuGrGUbrtspf1nLLWmZFmVCdcbFFGz5F71jr4Was/X4CoiM0m83OnAVvHxUpLAS0xg8QgZEgxkJFkGP9KIvr4omBI3RFvwj6IqX0JX0UOb91f8qIvdJerb4cuFmjeTKtD1ny10Tz+XLSU9jUo7aXtuhyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951106; c=relaxed/simple; bh=vPz8wTXN8uglRlo1YEQGfwmOgDpMy0ulU6ynQw2TlmM=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=IEGfVYH99zELke9qjKoNNyA/px8/IJrscGX/SLIBbpHmIRQHAAolCF0nPjQRBfdPBBzKSx8Zoonl3H7bA5UxcQcDCTbu5Qw1aUPS90FMcScdJW5T9K7U7FiYgJP9pSE2JDiB9PND9vHnFj8EQ2cc70w0ujzkG5lElr/e46iRz+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jzHKG6XV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jzHKG6XV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5943C116D0; Tue, 24 Feb 2026 16:38:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951106; bh=vPz8wTXN8uglRlo1YEQGfwmOgDpMy0ulU6ynQw2TlmM=; h=Date:From:To:Cc:Subject:References:From; b=jzHKG6XViX5S0HzhnSZPLq48BsF1I0KPDS/ELcWJDRmbVxj3+w5JnnhREolfJV2k0 iNOILQz2bKlDwHid0DHdrG4cQ96BlN8s/kIAhT2ofppe0boy02qz9J6+Ab8dO03GdF T4yn7cX8yoWSU3jTo8yGYaSinkduMNL15gFclmwDd+bToDuy/sYxztqXqQ5HEPTLCG Px1v6Mf++ifltenvpysKUBS5dK0onQX4lwb7Qv7+Qn5Fke2R5KTNPr2yBUsiYRePAe SM3qx7oDI1TDwj9p1xpBPWbB0LDq2KkMMIi5hGrV/J825SBXgPQvVFDgx44c+1wLJP +XtbUcUDEPvsg== Date: Tue, 24 Feb 2026 17:38:23 +0100 Message-ID: <20260224163431.338569372@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 39/48] hrtimer: Avoid re-evaluation when nothing changed References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Most times there is no change between hrtimer_interrupt() deferring the rea= rm and the invocation of hrtimer_rearm_deferred(). In those cases it's a point= less exercise to re-evaluate the next expiring timer. Cache the required data and use it if nothing changed. Signed-off-by: Thomas Gleixner --- include/linux/hrtimer_defs.h | 53 +++++++++++++++++++++-----------------= ----- kernel/time/hrtimer.c | 45 +++++++++++++++++++++++++----------- 2 files changed, 58 insertions(+), 40 deletions(-) --- a/include/linux/hrtimer_defs.h +++ b/include/linux/hrtimer_defs.h @@ -47,32 +47,31 @@ enum hrtimer_base_type { =20 /** * struct hrtimer_cpu_base - the per cpu clock bases - * @lock: lock protecting the base and associated clock bases - * and timers - * @cpu: cpu number - * @active_bases: Bitfield to mark bases with active timers - * @clock_was_set_seq: Sequence counter of clock was set events - * @hres_active: State of high resolution mode - * @deferred_rearm: A deferred rearm is pending - * @hang_detected: The last hrtimer interrupt detected a hang - * @softirq_activated: displays, if the softirq is raised - update of soft= irq - * related settings is not required then. - * @nr_events: Total number of hrtimer interrupt events - * @nr_retries: Total number of hrtimer interrupt retries - * @nr_hangs: Total number of hrtimer interrupt hangs - * @max_hang_time: Maximum time spent in hrtimer_interrupt - * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer a= re - * expired - * @online: CPU is online from an hrtimers point of view - * @timer_waiters: A hrtimer_cancel() invocation waits for the timer - * callback to finish. - * @expires_next: absolute time of the next event, is required for remote - * hrtimer enqueue; it is the total first expiry time (hard - * and soft hrtimer are taken into account) - * @next_timer: Pointer to the first expiring timer - * @softirq_expires_next: Time to check, if soft queues needs also to be e= xpired - * @softirq_next_timer: Pointer to the first expiring softirq based timer - * @clock_base: array of clock bases for this cpu + * @lock: lock protecting the base and associated clock bases and timers + * @cpu: cpu number + * @active_bases: Bitfield to mark bases with active timers + * @clock_was_set_seq: Sequence counter of clock was set events + * @hres_active: State of high resolution mode + * @deferred_rearm: A deferred rearm is pending + * @deferred_needs_update: The deferred rearm must re-evaluate the first t= imer + * @hang_detected: The last hrtimer interrupt detected a hang + * @softirq_activated: displays, if the softirq is raised - update of sof= tirq + * related settings is not required then. + * @nr_events: Total number of hrtimer interrupt events + * @nr_retries: Total number of hrtimer interrupt retries + * @nr_hangs: Total number of hrtimer interrupt hangs + * @max_hang_time: Maximum time spent in hrtimer_interrupt + * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer a= re expired + * @online: CPU is online from an hrtimers point of view + * @timer_waiters: A hrtimer_cancel() waiters for the timer callback to f= inish. + * @expires_next: Absolute time of the next event, is required for remote + * hrtimer enqueue; it is the total first expiry time (hard + * and soft hrtimer are taken into account) + * @next_timer: Pointer to the first expiring timer + * @softirq_expires_next: Time to check, if soft queues needs also to be e= xpired + * @softirq_next_timer: Pointer to the first expiring softirq based timer + * @deferred_expires_next: Cached expires next value for deferred rearm + * @clock_base: Array of clock bases for this cpu * * Note: next_timer is just an optimization for __remove_hrtimer(). * Do not dereference the pointer because it is not reliable on @@ -85,6 +84,7 @@ struct hrtimer_cpu_base { unsigned int clock_was_set_seq; bool hres_active; bool deferred_rearm; + bool deferred_needs_update; bool hang_detected; bool softirq_activated; bool online; @@ -102,6 +102,7 @@ struct hrtimer_cpu_base { struct hrtimer *next_timer; ktime_t softirq_expires_next; struct hrtimer *softirq_next_timer; + ktime_t deferred_expires_next; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; call_single_data_t csd; } ____cacheline_aligned; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -919,8 +919,10 @@ static bool update_needs_ipi(struct hrti return false; =20 /* If a deferred rearm is pending the remote CPU will take care of it */ - if (cpu_base->deferred_rearm) + if (cpu_base->deferred_rearm) { + cpu_base->deferred_needs_update =3D true; return false; + } =20 /* * Walk the affected clock bases and check whether the first expiring @@ -1141,7 +1143,12 @@ static void __remove_hrtimer(struct hrti * a local timer is removed to be immediately restarted. That's handled * at the call site. */ - if (reprogram && timer =3D=3D cpu_base->next_timer && !timer->is_lazy) + if (!reprogram || timer !=3D cpu_base->next_timer || timer->is_lazy) + return; + + if (cpu_base->deferred_rearm) + cpu_base->deferred_needs_update =3D true; + else hrtimer_force_reprogram(cpu_base, /* skip_equal */ true); } =20 @@ -1328,8 +1335,10 @@ static bool __hrtimer_start_range_ns(str } =20 /* If a deferred rearm is pending skip reprogramming the device */ - if (cpu_base->deferred_rearm) + if (cpu_base->deferred_rearm) { + cpu_base->deferred_needs_update =3D true; return false; + } =20 if (!was_first || cpu_base !=3D this_cpu_base) { /* @@ -1939,8 +1948,7 @@ static __latent_entropy void hrtimer_run * Very similar to hrtimer_force_reprogram(), except it deals with * deferred_rearm and hang_detected. */ -static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, - ktime_t expires_next, bool deferred) +static void hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t expir= es_next, bool deferred) { cpu_base->expires_next =3D expires_next; cpu_base->deferred_rearm =3D false; @@ -1950,7 +1958,7 @@ static void hrtimer_rearm(struct hrtimer * Give the system a chance to do something else than looping * on hrtimer interrupts. */ - expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); + expires_next =3D ktime_add_ns(ktime_get(), 100 * NSEC_PER_MSEC); cpu_base->hang_detected =3D false; } hrtimer_rearm_event(expires_next, deferred); @@ -1960,27 +1968,36 @@ static void hrtimer_rearm(struct hrtimer void __hrtimer_rearm_deferred(void) { struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); - ktime_t now, expires_next; + ktime_t expires_next; =20 if (!cpu_base->deferred_rearm) return; =20 guard(raw_spinlock)(&cpu_base->lock); - now =3D hrtimer_update_base(cpu_base); - expires_next =3D hrtimer_update_next_event(cpu_base); - hrtimer_rearm(cpu_base, now, expires_next, true); + if (cpu_base->deferred_needs_update) { + hrtimer_update_base(cpu_base); + expires_next =3D hrtimer_update_next_event(cpu_base); + } else { + /* No timer added/removed. Use the cached value */ + expires_next =3D cpu_base->deferred_expires_next; + } + hrtimer_rearm(cpu_base, expires_next, true); } =20 static __always_inline void -hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, kt= ime_t expires_next) +hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t expires= _next) { + /* hrtimer_interrupt() just re-evaluated the first expiring timer */ + cpu_base->deferred_needs_update =3D false; + /* Cache the expiry time */ + cpu_base->deferred_expires_next =3D expires_next; set_thread_flag(TIF_HRTIMER_REARM); } #else /* CONFIG_HRTIMER_REARM_DEFERRED */ static __always_inline void -hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now, kt= ime_t expires_next) +hrtimer_interrupt_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t expires= _next) { - hrtimer_rearm(cpu_base, now, expires_next, false); + hrtimer_rearm(cpu_base, expires_next, false); } #endif /* !CONFIG_HRTIMER_REARM_DEFERRED */ =20 @@ -2041,7 +2058,7 @@ void hrtimer_interrupt(struct clock_even cpu_base->hang_detected =3D true; } =20 - hrtimer_interrupt_rearm(cpu_base, now, expires_next); + hrtimer_interrupt_rearm(cpu_base, expires_next); raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BF6C3A9014 for ; Tue, 24 Feb 2026 16:38:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951111; cv=none; b=XpExxaN2kt4eBqvNUVhRiwtycJm5plrZcu4kq9Cwg+NFDIeKzxbTZKUQCdN4QvL38QQDDb6IsExEgMczBKAE9UmwZ6SzIO9mP46STh53hWunYsuNppc+ctxz+08xRLSj6+K9uY6OP7mbUfZbDuy2Kce1GEsVbSj2zGWxm3fvRvY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951111; c=relaxed/simple; bh=utgRRfXsZv5BgZu0Z+5QCkJUMHp6bkQjxgFEp8JtIdc=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Vg1gErxREQIZdiw4/2bsHZ+PywL0ERFiJaoRLVRyYiT2fvURAKUQ+wErIuJ4AjSZgyFlQ8JB62TsruN2TkFaDApnDS2UxnIcFb342CBHj/Y5B3FeBpezry1CxhBb7i8rk2u6ijmW0BWqMBR3s85oTKTUNILi5MPGLzscPJmLFuM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MugDg55B; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MugDg55B" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C3154C116D0; Tue, 24 Feb 2026 16:38:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951111; bh=utgRRfXsZv5BgZu0Z+5QCkJUMHp6bkQjxgFEp8JtIdc=; h=Date:From:To:Cc:Subject:References:From; b=MugDg55BgOaZeWUSKjP31vGYchMk3CbKbQwQk8UFZEXJHbNwptGLYABj7HDXgXWVn +qvX3nf319ybFym7A7Cgj7ruYwfd5qouUWm7JuqLrBeWPWgQfzNrnNQuZidn0cNUhl RQKXibR/4xhJZepdD1cxmBACk7+I3J/1tvaNaObWKc5wNYqUaHAkAhhWY3wFOELo9u AYEnzAc8Fi4uW89lZqxtRsKNvZ4zdimFCkGef0twh3s4kap5jqT1Q79TIkOTd2n7n0 2aMlfpJ3M6uvWm/Hyf7LwmoVn55++B7x2fnaR8T5mqRS8aUfKlKeIUe+s39zrjaGtG UttWbXAqPmsdg== Date: Tue, 24 Feb 2026 17:38:28 +0100 Message-ID: <20260224163431.404839710@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 40/48] hrtimer: Keep track of first expiring timer per clock base References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Evaluating the next expiry time of all clock bases is cache line expensive as the expiry time of the first expiring timer is not cached in the base and requires to access the timer itself, which is definitely in a different cache line. It's way more efficient to keep track of the expiry time on enqueue and dequeue operations as the relevant data is already in the cache at that point. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/hrtimer_defs.h | 2 ++ kernel/time/hrtimer.c | 37 ++++++++++++++++++++++++++++++++++--- 2 files changed, 36 insertions(+), 3 deletions(-) --- a/include/linux/hrtimer_defs.h +++ b/include/linux/hrtimer_defs.h @@ -19,6 +19,7 @@ * timer to a base on another cpu. * @clockid: clock id for per_cpu support * @seq: seqcount around __run_hrtimer + * @expires_next: Absolute time of the next event in this clock base * @running: pointer to the currently running hrtimer * @active: red black tree root node for the active timers * @offset: offset of this clock to the monotonic base @@ -28,6 +29,7 @@ struct hrtimer_clock_base { unsigned int index; clockid_t clockid; seqcount_raw_spinlock_t seq; + ktime_t expires_next; struct hrtimer *running; struct timerqueue_head active; ktime_t offset; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1107,7 +1107,18 @@ static bool enqueue_hrtimer(struct hrtim /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); =20 - return timerqueue_add(&base->active, &timer->node); + if (!timerqueue_add(&base->active, &timer->node)) + return false; + + base->expires_next =3D hrtimer_get_expires(timer); + return true; +} + +static inline void base_update_next_timer(struct hrtimer_clock_base *base) +{ + struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + + base->expires_next =3D next ? next->expires : KTIME_MAX; } =20 /* @@ -1122,6 +1133,7 @@ static void __remove_hrtimer(struct hrti bool newstate, bool reprogram) { struct hrtimer_cpu_base *cpu_base =3D base->cpu_base; + bool was_first; =20 lockdep_assert_held(&cpu_base->lock); =20 @@ -1131,9 +1143,17 @@ static void __remove_hrtimer(struct hrti /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->is_queued, newstate); =20 + was_first =3D &timer->node =3D=3D timerqueue_getnext(&base->active); + if (!timerqueue_del(&base->active, &timer->node)) cpu_base->active_bases &=3D ~(1 << base->index); =20 + /* Nothing to update if this was not the first timer in the base */ + if (!was_first) + return; + + base_update_next_timer(base); + /* * If reprogram is false don't update cpu_base->next_timer and do not * touch the clock event device. @@ -1182,9 +1202,12 @@ static inline bool remove_and_enqueue_same_base(struct hrtimer *timer, struct hrtimer_clock_b= ase *base, const enum hrtimer_mode mode, ktime_t expires, u64 delta_ns) { + bool was_first =3D false; + /* Remove it from the timer queue if active */ if (timer->is_queued) { debug_hrtimer_deactivate(timer); + was_first =3D &timer->node =3D=3D timerqueue_getnext(&base->active); timerqueue_del(&base->active, &timer->node); } =20 @@ -1197,8 +1220,16 @@ remove_and_enqueue_same_base(struct hrti /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); =20 - /* Returns true if this is the first expiring timer */ - return timerqueue_add(&base->active, &timer->node); + /* If it's the first expiring timer now or again, update base */ + if (timerqueue_add(&base->active, &timer->node)) { + base->expires_next =3D expires; + return true; + } + + if (was_first) + base_update_next_timer(base); + + return false; } =20 static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t= tim, From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 499533A9605 for ; Tue, 24 Feb 2026 16:38:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951116; cv=none; b=ajbtXaROeaRxbqzzrNWiKWns6s6BbdiXXgj6+i8RIR/luD/MQTyq57qGCPO0x/euQIirnXid9sVXuY47lVkomAKaePkuQm5EVEM2TR1UQeDh5t8VvErF8o9Mgd82oP8XmLSiLjmDnh3AVzZZXxndrVLuZWYEfog9DL/WHWUa9Zc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951116; c=relaxed/simple; bh=sjCiGHHw3GKDHbj+ZS6NFRyqQGela1LvA8s4PzLn/V8=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=I14nHZrzuQv4Kv3yWOOEVAYBnvrnJDkdjjBDE4RA0YRhg4JpZDXSChrVkVNRvsCeUe25fRLncEzpiJ/qbXkKpuDssBLYkOQMTY/f8mKa68xsXlr3DvOQrfratpxnbii8fR0fYR1b84TdAjMTVaf8U5YFg3/+nH6KXlY4ifZ41es= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KIirvjmC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KIirvjmC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AC62DC116D0; Tue, 24 Feb 2026 16:38:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951116; bh=sjCiGHHw3GKDHbj+ZS6NFRyqQGela1LvA8s4PzLn/V8=; h=Date:From:To:Cc:Subject:References:From; b=KIirvjmC9nQDYsj8SuehyjFxBUlQ5ZqrJyNUYP1GdvcO8w6Rl0yuqul6ZNPUMrVpS 56EjLMySG75ibCmDcAgwz4hiD1I3WqA54xGWyROqSPSVGn0jdw+cO/8V02bH3bRU+G zdTZk468J2T3F+M6THUBvJWlXmuX0QigWosQ0N3Qp3Jjo5Ju8hy4WwptpMEjNG6vp7 uPC37tEDJA+jjZqeHcmLo1eJ22tmV/2CPSP7UYl8BCvHY2HX1eQNmZrQ00uOES4q9M 80DgA0GmS861NSF9CCgu2fJNEz9I/oZM1zgkWOrl84VdoULZRINSCfQzO3BbK5heK0 jogIFecYRaPmA== Date: Tue, 24 Feb 2026 17:38:33 +0100 Message-ID: <20260224163431.468186893@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 41/48] hrtimer: Rework next event evaluation References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The per clock base cached expiry time allows to do a more efficient evaluation of the next expiry on a CPU. Separate the reprogramming evaluation from the NOHZ idle evaluation which needs to exclude the NOHZ timer to keep the reprogramming path lean and clean. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 120 ++++++++++++++++++++++++++++-----------------= ----- 1 file changed, 69 insertions(+), 51 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -546,49 +546,67 @@ static struct hrtimer_clock_base * #define for_each_active_base(base, cpu_base, active) \ while ((base =3D __next_base((cpu_base), &(active)))) =20 -static ktime_t __hrtimer_next_event_base(struct hrtimer_cpu_base *cpu_base, - const struct hrtimer *exclude, - unsigned int active, ktime_t expires_next) +#if defined(CONFIG_NO_HZ_COMMON) +/* + * Same as hrtimer_bases_next_event() below, but skips the excluded timer = and + * does not update cpu_base->next_timer/expires. + */ +static ktime_t hrtimer_bases_next_event_without(struct hrtimer_cpu_base *c= pu_base, + const struct hrtimer *exclude, + unsigned int active, ktime_t expires_next) { struct hrtimer_clock_base *base; ktime_t expires; =20 + lockdep_assert_held(&cpu_base->lock); + for_each_active_base(base, cpu_base, active) { - struct timerqueue_node *next; - struct hrtimer *timer; + expires =3D ktime_sub(base->expires_next, base->offset); + if (expires >=3D expires_next) + continue; =20 - next =3D timerqueue_getnext(&base->active); - timer =3D container_of(next, struct hrtimer, node); - if (timer =3D=3D exclude) { - /* Get to the next timer in the queue. */ - next =3D timerqueue_iterate_next(next); - if (!next) - continue; + /* + * If the excluded timer is the first on this base evaluate the + * next timer. + */ + struct timerqueue_node *node =3D timerqueue_getnext(&base->active); =20 - timer =3D container_of(next, struct hrtimer, node); + if (unlikely(&exclude->node =3D=3D node)) { + node =3D timerqueue_iterate_next(node); + if (!node) + continue; + expires =3D ktime_sub(node->expires, base->offset); + if (expires >=3D expires_next) + continue; } - expires =3D ktime_sub(hrtimer_get_expires(timer), base->offset); - if (expires < expires_next) { - expires_next =3D expires; + expires_next =3D expires; + } + /* If base->offset changed, the result might be negative */ + return max(expires_next, 0); +} +#endif =20 - /* Skip cpu_base update if a timer is being excluded. */ - if (exclude) - continue; +static __always_inline struct hrtimer *clock_base_next_timer(struct hrtime= r_clock_base *base) +{ + struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + + return container_of(next, struct hrtimer, node); +} =20 - if (timer->is_soft) - cpu_base->softirq_next_timer =3D timer; - else - cpu_base->next_timer =3D timer; +/* Find the base with the earliest expiry */ +static void hrtimer_bases_first(struct hrtimer_cpu_base *cpu_base,unsigned= int active, + ktime_t *expires_next, struct hrtimer **next_timer) +{ + struct hrtimer_clock_base *base; + ktime_t expires; + + for_each_active_base(base, cpu_base, active) { + expires =3D ktime_sub(base->expires_next, base->offset); + if (expires < *expires_next) { + *expires_next =3D expires; + *next_timer =3D clock_base_next_timer(base); } } - /* - * clock_was_set() might have changed base->offset of any of - * the clock bases so the result might be negative. Fix it up - * to prevent a false positive in clockevents_program_event(). - */ - if (expires_next < 0) - expires_next =3D 0; - return expires_next; } =20 /* @@ -617,19 +635,22 @@ static ktime_t __hrtimer_get_next_event( ktime_t expires_next =3D KTIME_MAX; unsigned int active; =20 + lockdep_assert_held(&cpu_base->lock); + if (!cpu_base->softirq_activated && (active_mask & HRTIMER_ACTIVE_SOFT)) { active =3D cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; - cpu_base->softirq_next_timer =3D NULL; - expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, active, KTIME= _MAX); - next_timer =3D cpu_base->softirq_next_timer; + if (active) + hrtimer_bases_first(cpu_base, active, &expires_next, &next_timer); + cpu_base->softirq_next_timer =3D next_timer; } =20 if (active_mask & HRTIMER_ACTIVE_HARD) { active =3D cpu_base->active_bases & HRTIMER_ACTIVE_HARD; + if (active) + hrtimer_bases_first(cpu_base, active, &expires_next, &next_timer); cpu_base->next_timer =3D next_timer; - expires_next =3D __hrtimer_next_event_base(cpu_base, NULL, active, expir= es_next); } - return expires_next; + return max(expires_next, 0); } =20 static ktime_t hrtimer_update_next_event(struct hrtimer_cpu_base *cpu_base) @@ -724,11 +745,7 @@ static void __hrtimer_reprogram(struct h hrtimer_rearm_event(expires_next, false); } =20 -/* - * Reprogram the event source with checking both queues for the - * next event - * Called with interrupts disabled and base->lock held - */ +/* Reprogram the event source with a evaluation of all clock bases */ static void hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, boo= l skip_equal) { ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); @@ -1662,19 +1679,20 @@ u64 hrtimer_next_event_without(const str { struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); u64 expires =3D KTIME_MAX; + unsigned int active; =20 guard(raw_spinlock_irqsave)(&cpu_base->lock); - if (hrtimer_hres_active(cpu_base)) { - unsigned int active; + if (!hrtimer_hres_active(cpu_base)) + return expires; =20 - if (!cpu_base->softirq_activated) { - active =3D cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; - expires =3D __hrtimer_next_event_base(cpu_base, exclude, active, KTIME_= MAX); - } - active =3D cpu_base->active_bases & HRTIMER_ACTIVE_HARD; - expires =3D __hrtimer_next_event_base(cpu_base, exclude, active, expires= ); - } - return expires; + active =3D cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; + if (active && !cpu_base->softirq_activated) + expires =3D hrtimer_bases_next_event_without(cpu_base, exclude, active, = KTIME_MAX); + + active =3D cpu_base->active_bases & HRTIMER_ACTIVE_HARD; + if (!active) + return expires; + return hrtimer_bases_next_event_without(cpu_base, exclude, active, expire= s); } #endif From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 001B73A7F73 for ; Tue, 24 Feb 2026 16:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951121; cv=none; b=EfLJs0bhKLvmJ7hz6Cae6w0rTJPW2CSCeyhJdkh690KfVO40Veyqcg7axvBMmrlrYaSqwHu0kTxyK7jsqY88bow4Vw+8eA+2XMG/BQcD2tkcG+26pXxUhe/iKFaT8JHc5v5Wr+qJOb/3+o497sNgeyzmprLeo7CWHkZZH1y7Hcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951121; c=relaxed/simple; bh=wSWi8GYCC+SeYLMw04/ZuqmGQ5IaOFxtaMRMhBDZ/eE=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=AsZhTrU7lA6cd6nK60kf+f9F8y0cH0DXYxQC2N1k5wxviMoyKIfrifjcpWX0u729rZlcZW4/RNVDPOIQaU16Glrn16Ygrel47JOYdVJ6VsvkVmXh/gjGbwRFdCoWpKlVDdhO5mVCb7cpgyP6MoBUzUpj9RjECH1Kgp5y/+iRGzs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=h7s0Bic2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h7s0Bic2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E5F8C116D0; Tue, 24 Feb 2026 16:38:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951120; bh=wSWi8GYCC+SeYLMw04/ZuqmGQ5IaOFxtaMRMhBDZ/eE=; h=Date:From:To:Cc:Subject:References:From; b=h7s0Bic2eVpkO93AEi7s1IuS956/6schqDkEhLkl2upWzU4+vxPwEpnSE7N4Y215Z s1klB1tHwg4j8/zQUiCBuj0kJ/8k7i1eOfuZaOcTBc8+c52kmBLvUHJ7EIPQ8WIEGz HXxX+s0XLHhN9h1gYxcvma6+dk90/PMWZWG8T+BwrNJ5pGB4qYtSf3yAbMJKOUyk92 bdlVS7pHwo5EB7PPahRxrUNo+Ij/3ygdpep4IeuuZL/Naufui78b9yc6rtM9c3Yl7u UojTzr0qmfrUZxfqxEsOBTsLXab+at5cBdZTE/iT72f6EtsEncADMrJB8zMR94N+2L sNGFnrRAALuAw== Date: Tue, 24 Feb 2026 17:38:37 +0100 Message-ID: <20260224163431.532927977@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 42/48] hrtimer: Simplify run_hrtimer_queues() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Replace the open coded container_of() orgy with a trivial clock_base_next_timer() helper. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1933,6 +1933,13 @@ static void __run_hrtimer(struct hrtimer base->running =3D NULL; } =20 +static __always_inline struct hrtimer *clock_base_next_timer_safe(struct h= rtimer_clock_base *base) +{ + struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + + return next ? container_of(next, struct hrtimer, node) : NULL; +} + static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_= t now, unsigned long flags, unsigned int active_mask) { @@ -1940,16 +1947,10 @@ static void __hrtimer_run_queues(struct struct hrtimer_clock_base *base; =20 for_each_active_base(base, cpu_base, active) { - struct timerqueue_node *node; - ktime_t basenow; - - basenow =3D ktime_add(now, base->offset); - - while ((node =3D timerqueue_getnext(&base->active))) { - struct hrtimer *timer; - - timer =3D container_of(node, struct hrtimer, node); + ktime_t basenow =3D ktime_add(now, base->offset); + struct hrtimer *timer; =20 + while ((timer =3D clock_base_next_timer(base))) { /* * The immediate goal for using the softexpires is * minimizing wakeups, not running timers at the From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADB9B3ACA53 for ; Tue, 24 Feb 2026 16:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951125; cv=none; b=kXYIaha9Xpxwu2N25WG6pHWcjiTXw5Imspv+RswOKy6t+68/7bgANdo857W/HAK/Ib0iGL1gGbrBXrefimVlJpL/KDbo9Q64nsDUglCKYhOQRTUB6vG8oROc3X2jJxm4ElhKY/Amjhr7hNaJCR2L6onk+69FD8M96MI2mP48RIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951125; c=relaxed/simple; bh=6fDl7jGUwqF0HXeZoXbzJc6gvvPqvyoTJsnltVER2lI=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=aLd0rGSIK251sx4O6fYjks4Pl+oaxIShDwG6mv1Xo0GhigEq49SXWsamXKTd4lAR92Qlqiku6nnWHRCvSldNs3//lPwYWXP2wvfUp6g87IVzzgumBFlHejmzTDsr13xtZksiKbMfj1OfxqpKu2W+O+jRsXC8lbbs3lLFYnOHP7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jIL4L3GI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jIL4L3GI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F3BB8C116D0; Tue, 24 Feb 2026 16:38:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951125; bh=6fDl7jGUwqF0HXeZoXbzJc6gvvPqvyoTJsnltVER2lI=; h=Date:From:To:Cc:Subject:References:From; b=jIL4L3GIkKG3P17ujrA1RzVKlD57iwJz7C/ptNdkyKu9f04FSOR7S5xXxr3tuURbD Cj8oAZ2WdZz3nw8CE8JlchtBTUTsX+lrwFVEhN1NjZWIJCuSUrWizyfgABHeEhK8bN PCUYC9/wFFS1Tj2Gpn7KRsVZcK+8CPgb+ofZMAv4p9geBbqKG8j7opzyzlJm9Nl5xi ymW5iLAu0iUr5tk7wHXDDahdwPZ8veRtJgdUUvfD70Llz07aFjw5qjTc9rxjvdWlPm COu+DFFRkyN4OA7qs832t0bye10hbSAB6oNSX8aQOAzlZDRQ4PfDdpKvnjk8XnVAyr hqvJzs1yfmElA== Date: Tue, 24 Feb 2026 17:38:42 +0100 Message-ID: <20260224163431.599804894@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 43/48] hrtimer: Optimize for_each_active_base() References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Give the compiler some help to emit way better code. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -529,22 +529,10 @@ static inline void debug_activate(struct trace_hrtimer_start(timer, mode, was_armed); } =20 -static struct hrtimer_clock_base * -__next_base(struct hrtimer_cpu_base *cpu_base, unsigned int *active) -{ - unsigned int idx; - - if (!*active) - return NULL; - - idx =3D __ffs(*active); - *active &=3D ~(1U << idx); - - return &cpu_base->clock_base[idx]; -} - -#define for_each_active_base(base, cpu_base, active) \ - while ((base =3D __next_base((cpu_base), &(active)))) +#define for_each_active_base(base, cpu_base, active) \ + for (unsigned int idx =3D ffs(active); idx--; idx =3D ffs((active))) \ + for (bool done =3D false; !done; active &=3D ~(1U << idx)) \ + for (base =3D &cpu_base->clock_base[idx]; !done; done =3D true) =20 #if defined(CONFIG_NO_HZ_COMMON) /* From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD91E3A963A for ; Tue, 24 Feb 2026 16:38:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951130; cv=none; b=lV4zG9TXD+dCMttubTMqH1om13s1kkwxeOm9cA2y2naWKN1gz9vCnQfZ0V36wgfl36YfuUiz+VXWN7iutl8U3f0577XZOO3+SLeEYXz6Rk4XXLfnMPPvMVRbQXRHM/Xg8y3Jf2oHgrIlZVOedDgW0Izf35UTtkpJ0LkuU9sVSL8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951130; c=relaxed/simple; bh=tmndQHUnRuTDofOaMMQK9OpYq3R1AQs1Md7RlJW2z8A=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jPJBkdBgYAJObyKSKed1yKx/Lf4bA5kjNzKFR47ChqqQoe8lUvvFMVn4vukawjEqAmh8M+xdf5ebdDcy1IU7wslPMgrASL70abWTaqUkLaBNUHYy4WtDWZrBLLdMVFVJ/X7kHHYGgPghusNdlDuFHiqmcadXIpMiJC+u1tv+ynY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kDqUMQC+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kDqUMQC+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0416C116D0; Tue, 24 Feb 2026 16:38:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951130; bh=tmndQHUnRuTDofOaMMQK9OpYq3R1AQs1Md7RlJW2z8A=; h=Date:From:To:Cc:Subject:References:From; b=kDqUMQC+GZ5IOT7hqPWz/MlnenXBuAco5npXpy9ZxfDkJlzJDC4ddIrNFUSwyikCS uMbO1Lgo88iCjGLbAwwimDRqjl6KnP1aEwREGkZmZrs94zBYBr6wexJWv/fjMn0NOx r4XDm8c397laq9N5vgFkPe07wuypeuFLtkaS+GymPj6H0+OecQqlnOBYhWOCSrJ68b I5go8EarkxEiikq3a/UwiJxdI5nJ0hn5pXavCngFZHdcn6eEqPAlzoK/5x6pHP55Sp rsmFIkUi6cMYJ0nGYk5ikB5nGdzODNmwoAqi9w8WcxNBK0FHF5tjgpBYY40cZNxxx/ fQHdVX8d3g1WA== Date: Tue, 24 Feb 2026 17:38:47 +0100 Message-ID: <20260224163431.668401024@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 44/48] rbtree: Provide rbtree with links References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some RB tree users require quick access to the next and the previous node, e.g. to check whether a modification of the node results in a change of the nodes position in the tree. If the node position does not change, then the modification can happen in place without going through a full enqueue requeue cycle. A upcoming use case for this are the timer queues of the hrtimer subsystem as they can optimize for timers which are frequently rearmed while enqueued. This can be obviously achieved with rb_next() and rb_prev(), but those turned out to be quite expensive for hotpath operations depending on the tree depth. Add a linked RB tree variant where add() and erase() maintain the links between the nodes. Like the cached variant it provides a pointer to the left most node in the root. It intentionally does not use a [h]list head as there is no real need for true list operations as the list is strictly coupled to the tree and and cannot be manipulated independently. It sets the nodes previous pointer to NULL for the left most node and the next pointer to NULL for the right most node. This allows a quick check especially for the left most node without consulting the list head address, which creates better code. Aside of the rb_leftmost cached pointer this could trivially provide a rb_rightmost pointer as well, but there is no usage for that (yet). Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/rbtree.h | 81 ++++++++++++++++++++++++++++++++++++++= ----- include/linux/rbtree_types.h | 16 ++++++++ lib/rbtree.c | 17 +++++++++ 3 files changed, 105 insertions(+), 9 deletions(-) --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -35,10 +35,15 @@ #define RB_CLEAR_NODE(node) \ ((node)->__rb_parent_color =3D (unsigned long)(node)) =20 +#define RB_EMPTY_LINKED_NODE(lnode) RB_EMPTY_NODE(&(lnode)->node) +#define RB_CLEAR_LINKED_NODE(lnode) ({ \ + RB_CLEAR_NODE(&(lnode)->node); \ + (lnode)->prev =3D (lnode)->next =3D NULL; \ +}) =20 extern void rb_insert_color(struct rb_node *, struct rb_root *); extern void rb_erase(struct rb_node *, struct rb_root *); - +extern bool rb_erase_linked(struct rb_node_linked *, struct rb_root_linked= *); =20 /* Find logical next and previous nodes in a tree */ extern struct rb_node *rb_next(const struct rb_node *); @@ -213,15 +218,10 @@ rb_add_cached(struct rb_node *node, stru return leftmost ? node : NULL; } =20 -/** - * rb_add() - insert @node into @tree - * @node: node to insert - * @tree: tree to insert @node into - * @less: operator defining the (partial) node order - */ static __always_inline void -rb_add(struct rb_node *node, struct rb_root *tree, - bool (*less)(struct rb_node *, const struct rb_node *)) +__rb_add(struct rb_node *node, struct rb_root *tree, + bool (*less)(struct rb_node *, const struct rb_node *), + void (*linkop)(struct rb_node *, struct rb_node *, struct rb_node **)) { struct rb_node **link =3D &tree->rb_node; struct rb_node *parent =3D NULL; @@ -234,10 +234,73 @@ rb_add(struct rb_node *node, struct rb_r link =3D &parent->rb_right; } =20 + linkop(node, parent, link); rb_link_node(node, parent, link); rb_insert_color(node, tree); } =20 +#define __node_2_linked_node(_n) \ + rb_entry((_n), struct rb_node_linked, node) + +static inline void +rb_link_linked_node(struct rb_node *node, struct rb_node *parent, struct r= b_node **link) +{ + if (!parent) + return; + + struct rb_node_linked *nnew =3D __node_2_linked_node(node); + struct rb_node_linked *npar =3D __node_2_linked_node(parent); + + if (link =3D=3D &parent->rb_left) { + nnew->prev =3D npar->prev; + nnew->next =3D npar; + npar->prev =3D nnew; + if (nnew->prev) + nnew->prev->next =3D nnew; + } else { + nnew->next =3D npar->next; + nnew->prev =3D npar; + npar->next =3D nnew; + if (nnew->next) + nnew->next->prev =3D nnew; + } +} + +/** + * rb_add_linked() - insert @node into the leftmost linked tree @tree + * @node: node to insert + * @tree: linked tree to insert @node into + * @less: operator defining the (partial) node order + * + * Returns @true when @node is the new leftmost, @false otherwise. + */ +static __always_inline bool +rb_add_linked(struct rb_node_linked *node, struct rb_root_linked *tree, + bool (*less)(struct rb_node *, const struct rb_node *)) +{ + __rb_add(&node->node, &tree->rb_root, less, rb_link_linked_node); + if (!node->prev) + tree->rb_leftmost =3D node; + return !node->prev; +} + +/* Empty linkop function which is optimized away by the compiler */ +static __always_inline void +rb_link_noop(struct rb_node *n, struct rb_node *p, struct rb_node **l) { } + +/** + * rb_add() - insert @node into @tree + * @node: node to insert + * @tree: tree to insert @node into + * @less: operator defining the (partial) node order + */ +static __always_inline void +rb_add(struct rb_node *node, struct rb_root *tree, + bool (*less)(struct rb_node *, const struct rb_node *)) +{ + __rb_add(node, tree, less, rb_link_noop); +} + /** * rb_find_add_cached() - find equivalent @node in @tree, or add @node * @node: node to look-for / insert --- a/include/linux/rbtree_types.h +++ b/include/linux/rbtree_types.h @@ -9,6 +9,12 @@ struct rb_node { } __attribute__((aligned(sizeof(long)))); /* The alignment might seem pointless, but allegedly CRIS needs it */ =20 +struct rb_node_linked { + struct rb_node node; + struct rb_node_linked *prev; + struct rb_node_linked *next; +}; + struct rb_root { struct rb_node *rb_node; }; @@ -28,7 +34,17 @@ struct rb_root_cached { struct rb_node *rb_leftmost; }; =20 +/* + * Leftmost tree with links. This would allow a trivial rb_rightmost updat= e, + * but that has been omitted due to the lack of users. + */ +struct rb_root_linked { + struct rb_root rb_root; + struct rb_node_linked *rb_leftmost; +}; + #define RB_ROOT (struct rb_root) { NULL, } #define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL } +#define RB_ROOT_LINKED (struct rb_root_linked) { {NULL, }, NULL } =20 #endif --- a/lib/rbtree.c +++ b/lib/rbtree.c @@ -446,6 +446,23 @@ void rb_erase(struct rb_node *node, stru } EXPORT_SYMBOL(rb_erase); =20 +bool rb_erase_linked(struct rb_node_linked *node, struct rb_root_linked *r= oot) +{ + if (node->prev) + node->prev->next =3D node->next; + else + root->rb_leftmost =3D node->next; + + if (node->next) + node->next->prev =3D node->prev; + + rb_erase(&node->node, &root->rb_root); + RB_CLEAR_LINKED_NODE(node); + + return !!root->rb_leftmost; +} +EXPORT_SYMBOL_GPL(rb_erase_linked); + /* * Augmented rbtree manipulation functions. * From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 290EA3A7F55 for ; Tue, 24 Feb 2026 16:38:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951136; cv=none; b=aPIyMT7nozconbMl+yuICwNrwo7BRb+krhqregH1Tdcje47+dpYWJRt6UXiNEI6yzExILQn5ZZHkOSukJCzTGDB3gRVJk4nKTqJGyKwG42OLMH3eiYncSNeBnAyeZk3YUdAAU+G/s+69u9Y4b2gDZQEQSEDKGY22srjP06WYnOo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951136; c=relaxed/simple; bh=ERFU2wUzLFOSg47So7ssEiGp38boms4jLm0JhxZ2SQs=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZzEqC/TI6zysOaSpDRVrQYerUQbIEcc9uy5JUm3diVaeSp9PpVLvPhOt2ARZOsdHXBPP433N40+baQbqIEmDJpzpzTgBJ01BYpOLALkpLlA6tpAErRJr6G42lFnCX7dtg/e7R5f83V7SqFoJFa+GWgCE7dJuK9NBZr5Jg3yZb4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=urU3Ec3h; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="urU3Ec3h" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 308E3C116D0; Tue, 24 Feb 2026 16:38:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951136; bh=ERFU2wUzLFOSg47So7ssEiGp38boms4jLm0JhxZ2SQs=; h=Date:From:To:Cc:Subject:References:From; b=urU3Ec3hyKGdIYh4UxhZvKhQwI2A91MrCt2Oh/28joIK2E/JPj1QZC/ixzlvRzc+Q k/j3Q5eRnafSc5KqBZpWnuPl7AtNd7K6iepT6SqJkjVDxSXlVWkq6JUKMzhqA7rdeM 54IKyF5T9RWz1h+T13ZBjuouT2PUUmHFhWK6Pm68+oHLXK4l+q+CAi6coUz49h/arj bLVyNz/ireE/u2VtIZDReRqzPaIRAHpP5TS7tuWpGhzJOOBHXSlXy0u45Di3N/PjZU eIM5juepR/ESCgqFO0KxzCVCfNEmwzDQL6J0Ub4iIHHvybas4JSm3NojMi+YlAp2eL nppUhhD4pAyHw== Date: Tue, 24 Feb 2026 17:38:52 +0100 Message-ID: <20260224163431.734827095@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 45/48] timerqueue: Provide linked timerqueue References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hrtimer subsystem wants to peak ahead to the next and previous timer to evaluated whether a to be rearmed timer can stay at the same position in the RB tree with the new expiry time. The linked RB tree provides the infrastructure for this as it maintains links to the previous and next nodes for each entry in the tree. Provide timerqueue wrappers around that. Signed-off-by: Thomas Gleixner --- include/linux/timerqueue.h | 56 +++++++++++++++++++++++++++++++++-= ----- include/linux/timerqueue_types.h | 15 ++++++++-- lib/timerqueue.c | 14 +++++++++ 3 files changed, 74 insertions(+), 11 deletions(-) --- a/include/linux/timerqueue.h +++ b/include/linux/timerqueue.h @@ -5,12 +5,11 @@ #include #include =20 -extern bool timerqueue_add(struct timerqueue_head *head, - struct timerqueue_node *node); -extern bool timerqueue_del(struct timerqueue_head *head, - struct timerqueue_node *node); -extern struct timerqueue_node *timerqueue_iterate_next( - struct timerqueue_node *node); +bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *= node); +bool timerqueue_del(struct timerqueue_head *head, struct timerqueue_node *= node); +struct timerqueue_node *timerqueue_iterate_next(struct timerqueue_node *no= de); + +bool timerqueue_linked_add(struct timerqueue_linked_head *head, struct tim= erqueue_linked_node *node); =20 /** * timerqueue_getnext - Returns the timer with the earliest expiration time @@ -19,8 +18,7 @@ extern struct timerqueue_node *timerqueu * * Returns a pointer to the timer node that has the earliest expiration ti= me. */ -static inline -struct timerqueue_node *timerqueue_getnext(struct timerqueue_head *head) +static inline struct timerqueue_node *timerqueue_getnext(struct timerqueue= _head *head) { struct rb_node *leftmost =3D rb_first_cached(&head->rb_root); =20 @@ -41,4 +39,46 @@ static inline void timerqueue_init_head( { head->rb_root =3D RB_ROOT_CACHED; } + +/* Timer queues with linked nodes */ + +static __always_inline +struct timerqueue_linked_node *timerqueue_linked_first(struct timerqueue_l= inked_head *head) +{ + return rb_entry_safe(head->rb_root.rb_leftmost, struct timerqueue_linked_= node, node); +} + +static __always_inline +struct timerqueue_linked_node *timerqueue_linked_next(struct timerqueue_li= nked_node *node) +{ + return rb_entry_safe(node->node.next, struct timerqueue_linked_node, node= ); +} + +static __always_inline +struct timerqueue_linked_node *timerqueue_linked_prev(struct timerqueue_li= nked_node *node) +{ + return rb_entry_safe(node->node.prev, struct timerqueue_linked_node, node= ); +} + +static __always_inline +bool timerqueue_linked_del(struct timerqueue_linked_head *head, struct tim= erqueue_linked_node *node) +{ + return rb_erase_linked(&node->node, &head->rb_root); +} + +static __always_inline void timerqueue_linked_init(struct timerqueue_linke= d_node *node) +{ + RB_CLEAR_LINKED_NODE(&node->node); +} + +static __always_inline bool timerqueue_linked_node_queued(struct timerqueu= e_linked_node *node) +{ + return !RB_EMPTY_LINKED_NODE(&node->node); +} + +static __always_inline void timerqueue_linked_init_head(struct timerqueue_= linked_head *head) +{ + head->rb_root =3D RB_ROOT_LINKED; +} + #endif /* _LINUX_TIMERQUEUE_H */ --- a/include/linux/timerqueue_types.h +++ b/include/linux/timerqueue_types.h @@ -6,12 +6,21 @@ #include =20 struct timerqueue_node { - struct rb_node node; - ktime_t expires; + struct rb_node node; + ktime_t expires; }; =20 struct timerqueue_head { - struct rb_root_cached rb_root; + struct rb_root_cached rb_root; +}; + +struct timerqueue_linked_node { + struct rb_node_linked node; + ktime_t expires; +}; + +struct timerqueue_linked_head { + struct rb_root_linked rb_root; }; =20 #endif /* _LINUX_TIMERQUEUE_TYPES_H */ --- a/lib/timerqueue.c +++ b/lib/timerqueue.c @@ -82,3 +82,17 @@ struct timerqueue_node *timerqueue_itera return container_of(next, struct timerqueue_node, node); } EXPORT_SYMBOL_GPL(timerqueue_iterate_next); + +#define __node_2_tq_linked(_n) \ + container_of(rb_entry((_n), struct rb_node_linked, node), struct timerque= ue_linked_node, node) + +static __always_inline bool __tq_linked_less(struct rb_node *a, const stru= ct rb_node *b) +{ + return __node_2_tq_linked(a)->expires < __node_2_tq_linked(b)->expires; +} + +bool timerqueue_linked_add(struct timerqueue_linked_head *head, struct tim= erqueue_linked_node *node) +{ + return rb_add_linked(&node->node, &head->rb_root, __tq_linked_less); +} +EXPORT_SYMBOL_GPL(timerqueue_linked_add); From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 166F43A9015 for ; Tue, 24 Feb 2026 16:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951141; cv=none; b=LyC+d5Yfxyp1Oq58NYxMcuAgn+p/m46OqnbK6t+oYmAXoHMYsF0irdTLNQztgdALy2MuDXgitJZUSV3KNlnRFgttXMKn+8610xLk+9G0iHzMP3li4Wjhss2tsApKdQIrYWd5wVrf/2rZdsFNMTZH/Zu5RHeJIrsxlQ80hWTxE5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951141; c=relaxed/simple; bh=bKQhrRJiiIbQvKDunwFGX+aEdKgIU1SPhMoabfS3wfQ=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Gh/ky38jcdBAm5MQvy6LTNKK09i5+Axcb1hRHquuKznEj3WhXCKrGSJl5aTifWCGb9ZAPh47d6/u54uoSaYlissT2yUjQdt7VenUepim/sNGCGsm6YPZ6jBiCaDgNUyPbq20GDF4r4JY2SCWFrOAoZ6VSRdPyiDxtoI2WX8Ekmk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bjORec1D; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bjORec1D" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F0A7C116D0; Tue, 24 Feb 2026 16:39:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951141; bh=bKQhrRJiiIbQvKDunwFGX+aEdKgIU1SPhMoabfS3wfQ=; h=Date:From:To:Cc:Subject:References:From; b=bjORec1D9vabH6D/F9q7U8AnCKqmucMivxRhYPKfSL9efq1E0TRYE13gJwLHjKINa Zfhx3nNgCjyvlzk34/72K8GMSW66r3WQ1mJMPTY6YzwJ+nFQrrnpJq7LowqCaKW4r1 Padj/LIbkAJhM5KAWaAZ6I92EocO9oapyBsNkq6tM3RaLyBoVbyqXwP/NIleebZfMe Kx4B6cVEImvPXerkyn+ltaVndOfD+VAXqRWxZ6zvmdDpoKmiW+YfjMhfxqV+hlujVF Z2IZ6e09H9OaA/+PcJ1UWQMYh+XZeZU6AmUUTsEzECktalWjyablDBX595KVimKaZ9 OhUeOk/N79lYg== Date: Tue, 24 Feb 2026 17:38:57 +0100 Message-ID: <20260224163431.806643179@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 46/48] hrtimer: Use linked timerqueue References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To prepare for optimizing the rearming of enqueued timers, switch to the linked timerqueue. That allows to check whether the new expiry time changes the position of the timer in the RB tree or not, by checking the new expiry time against the previous and the next timers expiry. Signed-off-by: Thomas Gleixner --- include/linux/hrtimer_defs.h | 16 ++++++++-------- include/linux/hrtimer_types.h | 8 ++++---- kernel/time/hrtimer.c | 34 +++++++++++++++++----------------- kernel/time/timer_list.c | 10 ++++------ 4 files changed, 33 insertions(+), 35 deletions(-) --- a/include/linux/hrtimer_defs.h +++ b/include/linux/hrtimer_defs.h @@ -25,14 +25,14 @@ * @offset: offset of this clock to the monotonic base */ struct hrtimer_clock_base { - struct hrtimer_cpu_base *cpu_base; - unsigned int index; - clockid_t clockid; - seqcount_raw_spinlock_t seq; - ktime_t expires_next; - struct hrtimer *running; - struct timerqueue_head active; - ktime_t offset; + struct hrtimer_cpu_base *cpu_base; + unsigned int index; + clockid_t clockid; + seqcount_raw_spinlock_t seq; + ktime_t expires_next; + struct hrtimer *running; + struct timerqueue_linked_head active; + ktime_t offset; } __hrtimer_clock_base_align; =20 enum hrtimer_base_type { --- a/include/linux/hrtimer_types.h +++ b/include/linux/hrtimer_types.h @@ -17,7 +17,7 @@ enum hrtimer_restart { =20 /** * struct hrtimer - the basic hrtimer structure - * @node: timerqueue node, which also manages node.expires, + * @node: Linked timerqueue node, which also manages node.expires, * the absolute expiry time in the hrtimers internal * representation. The time is related to the clock on * which the timer is based. Is setup by adding @@ -39,15 +39,15 @@ enum hrtimer_restart { * The hrtimer structure must be initialized by hrtimer_setup() */ struct hrtimer { - struct timerqueue_node node; - ktime_t _softexpires; - enum hrtimer_restart (*__private function)(struct hrtimer *); + struct timerqueue_linked_node node; struct hrtimer_clock_base *base; bool is_queued; bool is_rel; bool is_soft; bool is_hard; bool is_lazy; + ktime_t _softexpires; + enum hrtimer_restart (*__private function)(struct hrtimer *); }; =20 #endif /* _LINUX_HRTIMER_TYPES_H */ --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -557,10 +557,10 @@ static ktime_t hrtimer_bases_next_event_ * If the excluded timer is the first on this base evaluate the * next timer. */ - struct timerqueue_node *node =3D timerqueue_getnext(&base->active); + struct timerqueue_linked_node *node =3D timerqueue_linked_first(&base->a= ctive); =20 if (unlikely(&exclude->node =3D=3D node)) { - node =3D timerqueue_iterate_next(node); + node =3D timerqueue_linked_next(node); if (!node) continue; expires =3D ktime_sub(node->expires, base->offset); @@ -576,7 +576,7 @@ static ktime_t hrtimer_bases_next_event_ =20 static __always_inline struct hrtimer *clock_base_next_timer(struct hrtime= r_clock_base *base) { - struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + struct timerqueue_linked_node *next =3D timerqueue_linked_first(&base->ac= tive); =20 return container_of(next, struct hrtimer, node); } @@ -938,9 +938,9 @@ static bool update_needs_ipi(struct hrti active &=3D cpu_base->active_bases; =20 for_each_active_base(base, cpu_base, active) { - struct timerqueue_node *next; + struct timerqueue_linked_node *next; =20 - next =3D timerqueue_getnext(&base->active); + next =3D timerqueue_linked_first(&base->active); expires =3D ktime_sub(next->expires, base->offset); if (expires < cpu_base->expires_next) return true; @@ -1112,7 +1112,7 @@ static bool enqueue_hrtimer(struct hrtim /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); =20 - if (!timerqueue_add(&base->active, &timer->node)) + if (!timerqueue_linked_add(&base->active, &timer->node)) return false; =20 base->expires_next =3D hrtimer_get_expires(timer); @@ -1121,7 +1121,7 @@ static bool enqueue_hrtimer(struct hrtim =20 static inline void base_update_next_timer(struct hrtimer_clock_base *base) { - struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + struct timerqueue_linked_node *next =3D timerqueue_linked_first(&base->ac= tive); =20 base->expires_next =3D next ? next->expires : KTIME_MAX; } @@ -1148,9 +1148,9 @@ static void __remove_hrtimer(struct hrti /* Pairs with the lockless read in hrtimer_is_queued() */ WRITE_ONCE(timer->is_queued, newstate); =20 - was_first =3D &timer->node =3D=3D timerqueue_getnext(&base->active); + was_first =3D !timerqueue_linked_prev(&timer->node); =20 - if (!timerqueue_del(&base->active, &timer->node)) + if (!timerqueue_linked_del(&base->active, &timer->node)) cpu_base->active_bases &=3D ~(1 << base->index); =20 /* Nothing to update if this was not the first timer in the base */ @@ -1212,8 +1212,8 @@ remove_and_enqueue_same_base(struct hrti /* Remove it from the timer queue if active */ if (timer->is_queued) { debug_hrtimer_deactivate(timer); - was_first =3D &timer->node =3D=3D timerqueue_getnext(&base->active); - timerqueue_del(&base->active, &timer->node); + was_first =3D !timerqueue_linked_prev(&timer->node); + timerqueue_linked_del(&base->active, &timer->node); } =20 /* Set the new expiry time */ @@ -1226,7 +1226,7 @@ remove_and_enqueue_same_base(struct hrti WRITE_ONCE(timer->is_queued, HRTIMER_STATE_ENQUEUED); =20 /* If it's the first expiring timer now or again, update base */ - if (timerqueue_add(&base->active, &timer->node)) { + if (timerqueue_linked_add(&base->active, &timer->node)) { base->expires_next =3D expires; return true; } @@ -1758,7 +1758,7 @@ static void __hrtimer_setup(struct hrtim timer->is_hard =3D !!(mode & HRTIMER_MODE_HARD); timer->is_lazy =3D !!(mode & HRTIMER_MODE_LAZY_REARM); timer->base =3D &cpu_base->clock_base[base]; - timerqueue_init(&timer->node); + timerqueue_linked_init(&timer->node); =20 if (WARN_ON_ONCE(!fn)) ACCESS_PRIVATE(timer, function) =3D hrtimer_dummy_timeout; @@ -1923,7 +1923,7 @@ static void __run_hrtimer(struct hrtimer =20 static __always_inline struct hrtimer *clock_base_next_timer_safe(struct h= rtimer_clock_base *base) { - struct timerqueue_node *next =3D timerqueue_getnext(&base->active); + struct timerqueue_linked_node *next =3D timerqueue_linked_first(&base->ac= tive); =20 return next ? container_of(next, struct hrtimer, node) : NULL; } @@ -2369,7 +2369,7 @@ int hrtimers_prepare_cpu(unsigned int cp =20 clock_b->cpu_base =3D cpu_base; seqcount_raw_spinlock_init(&clock_b->seq, &cpu_base->lock); - timerqueue_init_head(&clock_b->active); + timerqueue_linked_init_head(&clock_b->active); } =20 cpu_base->cpu =3D cpu; @@ -2399,10 +2399,10 @@ int hrtimers_cpu_starting(unsigned int c static void migrate_hrtimer_list(struct hrtimer_clock_base *old_base, struct hrtimer_clock_base *new_base) { - struct timerqueue_node *node; + struct timerqueue_linked_node *node; struct hrtimer *timer; =20 - while ((node =3D timerqueue_getnext(&old_base->active))) { + while ((node =3D timerqueue_linked_first(&old_base->active))) { timer =3D container_of(node, struct hrtimer, node); BUG_ON(hrtimer_callback_running(timer)); debug_hrtimer_deactivate(timer); --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -56,13 +56,11 @@ print_timer(struct seq_file *m, struct h (long long)(ktime_to_ns(hrtimer_get_expires(timer)) - now)); } =20 -static void -print_active_timers(struct seq_file *m, struct hrtimer_clock_base *base, - u64 now) +static void print_active_timers(struct seq_file *m, struct hrtimer_clock_b= ase *base, u64 now) { + struct timerqueue_linked_node *curr; struct hrtimer *timer, tmp; unsigned long next =3D 0, i; - struct timerqueue_node *curr; unsigned long flags; =20 next_one: @@ -72,13 +70,13 @@ print_active_timers(struct seq_file *m, =20 raw_spin_lock_irqsave(&base->cpu_base->lock, flags); =20 - curr =3D timerqueue_getnext(&base->active); + curr =3D timerqueue_linked_first(&base->active); /* * Crude but we have to do this O(N*N) thing, because * we have to unlock the base when printing: */ while (curr && i < next) { - curr =3D timerqueue_iterate_next(curr); + curr =3D timerqueue_linked_next(curr); i++; } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C4F93ACA6B for ; Tue, 24 Feb 2026 16:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951146; cv=none; b=rj0XXL3GZpvnynDS2wvYngZfLm6SuuQF4sT0Kekk7ZjwqrC9emYv/DdvjlrZ7U8AqECSXYBVufLU/PtKzaiOyWzqKDtzUvbzkpiM6JXbS460rQDpQSQCNDxrh+cgqShOUcE0lV9A8x8bQtg9ux6t+shAhYZWFfq1UwB0Kzgi4MY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951146; c=relaxed/simple; bh=IlOwTicNdJJjJl9a9mKJb5nsgfJlG9h1i8fnUx6Rknk=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=cRDQWXLvKwt6rFK/QHGrg4Qq7xCp1Qj29bn0FcrTetdL/k17hlxOQk0MksUXHvyefkW1tb0u0xPoM0Btk9F4edOWHE2gK5weWde2czMwGcEoHwivQxiHOmdwillZ4R/1tGCHvQ4H4GaEfZdRwF5rimHZOg63VlkDF31hzJ0SCFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gaS1nkhO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gaS1nkhO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 651ECC116D0; Tue, 24 Feb 2026 16:39:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951146; bh=IlOwTicNdJJjJl9a9mKJb5nsgfJlG9h1i8fnUx6Rknk=; h=Date:From:To:Cc:Subject:References:From; b=gaS1nkhO/vbk7IFa1tIUjUaFw1pPjnpe6eE+g96srCq/FncErG42Q3YRBUv9HjUTu 1kYNOECWkfAqafuT+uj8e+R0OnNRQdUqM5aor9kxIiJUoHLycRD8Iq+BIHkoG37o8y wEf4uEnpyRvXF5arcmVa4ZQ9KndVVaRq1UNFSO4A00sNfQApYrIw7xXzKuZWbHxdLU anXKWeM3xehnObVKicDlHSdLIw6jbwBj7uAcqcs4RXDPBoBfFaGvJuEBMybL2prrPw BxNSHKmp9oaLylJXUu+iIeWDwgMEb+KHgcUEBLDPmoHZc3ZpINkEHXaHJewK/uFe0I n5Ij+3Wdz2LEA== Date: Tue, 24 Feb 2026 17:39:02 +0100 Message-ID: <20260224163431.873359816@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 47/48] hrtimer: Try to modify timers in place References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When modifying the expiry of a armed timer it is first dequeued, then the expiry value is updated and then it is queued again. This can be avoided when the new expiry value is within the range of the previous and the next timer as that does not change the position in the RB tree. The linked timerqueue allows to peak ahead to the neighbours and check whether the new expiry time is within the range of the previous and next timer. If so just modify the timer in place and spare the enqueue and requeue effort, which might end up rotating the RB tree twice for nothing. This speeds up the handling of frequently rearmed hrtimers, like the hrtick scheduler timer significantly. Signed-off-by: Thomas Gleixner --- kernel/time/hrtimer.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1203,6 +1203,31 @@ static inline bool remove_hrtimer(struct return false; } =20 +/* + * Update in place has to retrieve the expiry times of the neighbour nodes + * if they exist. That is cache line neutral because the dequeue/enqueue + * operation is going to need the same cache lines. But there is a big win + * when the dequeue/enqueue can be avoided because the RB tree does not + * have to be rebalanced twice. + */ +static inline bool +hrtimer_can_update_in_place(struct hrtimer *timer, struct hrtimer_clock_ba= se *base, ktime_t expires) +{ + struct timerqueue_linked_node *next =3D timerqueue_linked_next(&timer->no= de); + struct timerqueue_linked_node *prev =3D timerqueue_linked_prev(&timer->no= de); + + /* If the new expiry goes behind the next timer, requeue is required */ + if (next && expires > next->expires) + return false; + + /* If this is the first timer, update in place */ + if (!prev) + return true; + + /* Update in place when it does not go ahead of the previous one */ + return expires >=3D prev->expires; +} + static inline bool remove_and_enqueue_same_base(struct hrtimer *timer, struct hrtimer_clock_b= ase *base, const enum hrtimer_mode mode, ktime_t expires, u64 delta_ns) @@ -1211,8 +1236,18 @@ remove_and_enqueue_same_base(struct hrti =20 /* Remove it from the timer queue if active */ if (timer->is_queued) { - debug_hrtimer_deactivate(timer); was_first =3D !timerqueue_linked_prev(&timer->node); + + /* Try to update in place to avoid the de/enqueue dance */ + if (hrtimer_can_update_in_place(timer, base, expires)) { + hrtimer_set_expires_range_ns(timer, expires, delta_ns); + trace_hrtimer_start(timer, mode, true); + if (was_first) + base->expires_next =3D expires; + return was_first; + } + + debug_hrtimer_deactivate(timer); timerqueue_linked_del(&base->active, &timer->node); } From nobody Fri Apr 17 00:16:24 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A31103A0EAA for ; Tue, 24 Feb 2026 16:39:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951151; cv=none; b=Tr3HIl294NqEOL9S7jdzy/avhYH1aWXKgU3mT/JJfpSIjvhRci0RBdTZXBxlB23Cnbh27onmiC6B8Pa5HAxYwbEbJ69joSk/GtJA9wZD2iDIfgA2wFLkIMayX2DBLAWvNDZGU29vZFPjtMzyGD4XF42K3JqNUUJd1hb68ddUPHY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771951151; c=relaxed/simple; bh=lYWsU4NoHCXOhCKOBZmGcM/d5Tn3poyIWlxpU9IZssA=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=k/ha/BCzHBRvGiIALzwTzYZx8VPMf519CQPxEX/uK5jjhvegoid81sdE5+ZG0ny1PpfofUFQschpML7qTUjtIGskh40vP7ErQphVj3Nf71s6k08EXoVhX4KH9yLmgD3azKzgquWR75fQPghaKGm/Ji+Kf+DnFvFsPAckcjpLsqY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Wu2J34if; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Wu2J34if" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2A28C116D0; Tue, 24 Feb 2026 16:39:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771951151; bh=lYWsU4NoHCXOhCKOBZmGcM/d5Tn3poyIWlxpU9IZssA=; h=Date:From:To:Cc:Subject:References:From; b=Wu2J34ifb/nOUKPPQ3ivAye3QLkeyqPzMhNXW3kTE3t0NjcWQUWWv6yBVsm84EDoK DQj5DNGzCIU0GpuJXlTBIVHdfFUOFgrhH+6pfwy0B0ayxZJfAnruz3FngY3k3Gnqxc rGWeXC+q4VfnqLAgZRjXKwIqQ+m4+t7KdpY82psGuDGonIQZOTL/nPwV1PfD3mU8D4 0QPlWJkFkkiEdWkKPH+F5NU+J3ubd1EyVL025urFsYGy5DkOsmZn91lzKv93WYnPk+ 9k1FLbsygX6aMIMB/v4UJp/Ee0S5vk/hlj9s1z91M7fG7EoIbkG2HIsskh04DSjrpz eLX6tirL9jinQ== Date: Tue, 24 Feb 2026 17:39:08 +0100 Message-ID: <20260224163431.937531564@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , John Stultz , Stephen Boyd , Daniel Lezcano , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , x86@kernel.org, Peter Zijlstra , Frederic Weisbecker , Eric Dumazet Subject: [patch 48/48] sched: Default enable HRTICK when deferred rearming is enabled References: <20260224163022.795809588@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The deferred rearm of the clock event device after an interrupt and and other hrtimer optimizations allow now to enable HRTICK for generic entry architectures. This decouples preemption from CONFIG_HZ, leaving only the periodic load-balancer and various accounting things relying on the tick. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- kernel/sched/features.h | 5 +++++ 1 file changed, 5 insertions(+) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -63,8 +63,13 @@ SCHED_FEAT(DELAY_ZERO, true) */ SCHED_FEAT(WAKEUP_PREEMPTION, true) =20 +#ifdef CONFIG_HRTIMER_REARM_DEFERRED +SCHED_FEAT(HRTICK, true) +SCHED_FEAT(HRTICK_DL, true) +#else SCHED_FEAT(HRTICK, false) SCHED_FEAT(HRTICK_DL, false) +#endif =20 /* * Decrement CPU capacity based on time not spent running tasks