From nobody Sun Feb 8 21:27:10 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5757433F8CE for ; Wed, 21 Jan 2026 16:28:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769012887; cv=none; b=u2NaoQjASQX6uUkqUq07ROxuXkeoC6Rr/uOdJ4n1dNSnEtqDP8WKJY+9mGwSug/8Tj7dD/tEdS1Xk09frJY99NHtDKd6oUZ5HsnYe0qnhQNeSt1wrqYaHDnnLs+KGNa/P8gcFSKAvvyFQjRm+MDiJHk1fP1d32Z+RJR9xhym0fs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769012887; c=relaxed/simple; bh=Te9y/CXZYZgYptay9SL//Uu1/GDdxFQu0ViL6tC4QBk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CMEY9YrDW45ZBZraldxO7Wsz+JbdOv8NFF8Niw9N4g9+SwdvWSRj42I73Uw6FU5L6xOo4dGrBwNjKPQho47tAfRep3TwN/4kOQOCbuKVGnT4n+FJ32sic/GBwf0BZBWsE06xNZxPCK+aZG9CWTOKyYxQDnfozqr18ImKeaAGPGM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=BkfzvKkU; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="BkfzvKkU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=aBh7zs+ECeKPDwLLEVf7stzalM/R9is2GnAU+cT0P0c=; b=BkfzvKkUK+yfnht0zA7LvOxNeZ V2yiHfj5O1sgPT/pOR9lAkRDtgbnzbJqyqpiy8Pc9SpbPlYxa8t2J9xos+Mvprj4abiViZF0dWxsm M8zMhUDCkhjozEr4ylj1aDZGSBsdg6mn6B5JlFvx3UYvcpup/xw3KoHpVPE4e80Ueymskgig97Kh+ DPcZc0xaCL4QyKM5s8WVQixbhlImLLusy2OA4nfDt4wiBTY9wRlIfj7bLbB+gtmJLXHXokV4MTerq U8P89DQ+JVZXehZssp5p7uPYROVZcre6n9MXQU6tiVlJd66bvG57d+ezngIz96OSRafQ7JV2kGdYq GE6G8dMA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vib3l-0000000GOpX-2hF1; Wed, 21 Jan 2026 16:27:57 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E6A4E300DD3; Wed, 21 Jan 2026 17:27:55 +0100 (CET) Message-ID: <20260121162508.011240183@infradead.org> User-Agent: quilt/0.68 Date: Wed, 21 Jan 2026 17:20:15 +0100 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH v2 5/6] entry,hrtimer: Push reprogramming timers into the interrupt return path References: <20260121162010.647043073@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently hrtimer_interrupt() runs expired timers, which can re-arm themselves, after which it computes the next expiration time and re-programs the hardware. However, things like HRTICK, a highres timer driving preemption, cannot re-arm itself at the point of running, since the next task has not been determined yet. The schedule() in the interrupt return path will switch to the next task, which then causes a new hrtimer to be programmed. This then results in reprogramming the hardware at least twice, once after running the timers, and once upon selecting the new task. Notably, *both* events happen in the interrupt. By pushing the hrtimer reprogram all the way into the interrupt return path, it runs after schedule() and this double reprogram can be avoided. Signed-off-by: Peter Zijlstra (Intel) --- include/asm-generic/thread_info_tif.h | 5 ++++- include/linux/hrtimer.h | 17 +++++++++++++++++ include/linux/irq-entry-common.h | 2 ++ kernel/entry/common.c | 13 +++++++++++++ kernel/sched/core.c | 10 ++++++++++ kernel/time/hrtimer.c | 28 ++++++++++++++++++++++++---- 6 files changed, 70 insertions(+), 5 deletions(-) --- a/include/asm-generic/thread_info_tif.h +++ b/include/asm-generic/thread_info_tif.h @@ -41,11 +41,14 @@ #define _TIF_PATCH_PENDING BIT(TIF_PATCH_PENDING) =20 #ifdef HAVE_TIF_RESTORE_SIGMASK -# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() */ +# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() # define _TIF_RESTORE_SIGMASK BIT(TIF_RESTORE_SIGMASK) #endif =20 #define TIF_RSEQ 11 // Run RSEQ fast path #define _TIF_RSEQ BIT(TIF_RSEQ) =20 +#define TIF_HRTIMER_REARM 12 // re-arm the timer +#define _TIF_HRTIMER_REARM BIT(TIF_HRTIMER_REARM) + #endif /* _ASM_GENERIC_THREAD_INFO_TIF_H_ */ --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -175,10 +175,27 @@ extern void hrtimer_interrupt(struct clo =20 extern unsigned int hrtimer_resolution; =20 +#ifdef TIF_HRTIMER_REARM +extern void _hrtimer_rearm(void); +/* + * This is to be called on all irqentry_exit() paths that will enable + * interrupts; as well as in the context switch path before switch_to(). + */ +static inline void hrtimer_rearm(void) +{ + if (test_thread_flag(TIF_HRTIMER_REARM)) + _hrtimer_rearm(); +} +#else +static inline void hrtimer_rearm(void) { } +#endif /* TIF_HRTIMER_REARM */ + #else =20 #define hrtimer_resolution (unsigned int)LOW_RES_NSEC =20 +static inline void hrtimer_rearm(void) { } + #endif =20 static inline ktime_t --- a/include/linux/irq-entry-common.h +++ b/include/linux/irq-entry-common.h @@ -224,6 +224,8 @@ static __always_inline void __exit_to_us ti_work =3D read_thread_flags(); if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) ti_work =3D exit_to_user_mode_loop(regs, ti_work); + else + hrtimer_rearm(); =20 arch_exit_to_user_mode_prepare(regs, ti_work); } --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 /* Workaround to allow gradual conversion of architecture code */ void __weak arch_do_signal_or_restart(struct pt_regs *regs) { } @@ -26,6 +27,16 @@ static __always_inline unsigned long __e */ while (ti_work & EXIT_TO_USER_MODE_WORK_LOOP) { =20 + /* + * If hrtimer need re-arming, do so before enabling IRQs, + * except when a reschedule is needed, in that case schedule() + * will do this. + */ + if ((ti_work & (_TIF_NEED_RESCHED | + _TIF_NEED_RESCHED_LAZY | + _TIF_HRTIMER_REARM)) =3D=3D _TIF_HRTIMER_REARM) + hrtimer_rearm(); + local_irq_enable_exit_to_user(ti_work); =20 if (ti_work & (_TIF_NEED_RESCHED | _TIF_NEED_RESCHED_LAZY)) @@ -202,6 +213,7 @@ noinstr void irqentry_exit(struct pt_reg */ if (state.exit_rcu) { instrumentation_begin(); + hrtimer_rearm(); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); @@ -215,6 +227,7 @@ noinstr void irqentry_exit(struct pt_reg if (IS_ENABLED(CONFIG_PREEMPTION)) irqentry_exit_cond_resched(); =20 + hrtimer_rearm(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6814,6 +6814,16 @@ static void __sched notrace __schedule(i keep_resched: rq->last_seen_need_resched_ns =3D 0; =20 + /* + * Notably, this must be called after pick_next_task() but before + * switch_to(), since the new task need not be on the return from + * interrupt path. Additionally, exit_to_user_mode_loop() relies on + * any schedule() call to imply this call, so do it unconditionally. + * + * We've just cleared TIF_NEED_RESCHED, TIF word should be in cache. + */ + hrtimer_rearm(); + is_switch =3D prev !=3D next; if (likely(is_switch)) { rq->nr_switches++; --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1892,10 +1892,9 @@ static __latent_entropy void hrtimer_run * Very similar to hrtimer_force_reprogram(), except it deals with * in_hrirq and hang_detected. */ -static void __hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) +static void __hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, + ktime_t now, ktime_t expires_next) { - ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); - cpu_base->expires_next =3D expires_next; cpu_base->in_hrtirq =3D 0; =20 @@ -1970,9 +1969,30 @@ void hrtimer_interrupt(struct clock_even cpu_base->hang_detected =3D 1; } =20 - __hrtimer_rearm(cpu_base, now); +#ifdef TIF_HRTIMER_REARM + set_thread_flag(TIF_HRTIMER_REARM); +#else + __hrtimer_rearm(cpu_base, now, expires_next); +#endif raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } + +#ifdef TIF_HRTIMER_REARM +void _hrtimer_rearm(void) +{ + struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); + ktime_t now, expires_next; + + lockdep_assert_irqs_disabled(); + + scoped_guard (raw_spinlock, &cpu_base->lock) { + now =3D hrtimer_update_base(cpu_base); + expires_next =3D hrtimer_update_next_event(cpu_base); + __hrtimer_rearm(cpu_base, now, expires_next); + clear_thread_flag(TIF_HRTIMER_REARM); + } +} +#endif /* TIF_HRTIMER_REARM */ #endif /* !CONFIG_HIGH_RES_TIMERS */ =20 /*