[tip: timers/urgent] timers/migration: Fix livelock in tmigr_handle_remote_up()

tip-bot2 for Amit Matityahu posted 1 patch 3 days, 18 hours ago
kernel/time/timer_migration.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
[tip: timers/urgent] timers/migration: Fix livelock in tmigr_handle_remote_up()
Posted by tip-bot2 for Amit Matityahu 3 days, 18 hours ago
The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     d486b4934a8e504376b85cdb3766f306d57aff5b
Gitweb:        https://git.kernel.org/tip/d486b4934a8e504376b85cdb3766f306d57aff5b
Author:        Amit Matityahu <amitmat@amazon.com>
AuthorDate:    Wed, 03 Jun 2026 17:01:39 
Committer:     Thomas Gleixner <tglx@kernel.org>
CommitterDate: Thu, 04 Jun 2026 14:35:33 +02:00

timers/migration: Fix livelock in tmigr_handle_remote_up()

tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
smp_processor_id(), assuming the local softirq path already handled this
CPU's timers.

This assumption is wrong because jiffies can advance after the handling of
the CPU's global timers in run_timer_base(BASE_GLOBAL) and before
tmigr_handle_remote() evaluates the expiry times.

As a consequence a timer which expires after the CPU local timer wheel
advanced and becomes expired in the remote handling is ignored and the
callback is never invoked and removed from the timer wheel.

What's worse is that fetch_next_timer_interrupt_remote() keeps reporting it
as expired, and the event is re-queued with expires == now on each
iteration.  The goto-again loop spins indefinitely.

Fix this by calling timer_expire_remote() unconditionally. That's minimal
overhead for the common case as __run_timer_base() returns immediately if
there is nothing to expire in the local wheel.

[ tglx: Amend change log and add a comment ]

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Reported-by: Alon Kariv <alonka@amazon.com>
Signed-off-by: Amit Matityahu <amitmat@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260603170139.33628-1-amitmat@amazon.com
---
 kernel/time/timer_migration.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 1d0d3a4..52c15af 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -978,8 +978,12 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
 	/* Drop the lock to allow the remote CPU to exit idle */
 	raw_spin_unlock_irq(&tmc->lock);
 
-	if (cpu != smp_processor_id())
-		timer_expire_remote(cpu);
+	/*
+	 * This can't exclude the local CPU because jiffies might have advanced
+	 * after the timer softirq invoked run_timer_base(BASE_GLOBAL) and the
+	 * point where the jiffies snapshot @jif was taken in tmigr_handle_remote().
+	 */
+	timer_expire_remote(cpu);
 
 	/*
 	 * Lock ordering needs to be preserved - timer_base locks before tmigr