[tip: timers/urgent] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()

tip-bot2 for Oleg Nesterov posted 1 patch 1 year, 8 months ago
kernel/time/tick-common.c | 42 ++++++++++++--------------------------
1 file changed, 14 insertions(+), 28 deletions(-)
[tip: timers/urgent] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
Posted by tip-bot2 for Oleg Nesterov 1 year, 8 months ago
The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     07c54cc5988f19c9642fd463c2dbdac7fc52f777
Gitweb:        https://git.kernel.org/tip/07c54cc5988f19c9642fd463c2dbdac7fc52f777
Author:        Oleg Nesterov <oleg@redhat.com>
AuthorDate:    Tue, 28 May 2024 14:20:19 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 10 Jun 2024 20:18:13 +02:00

tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()

After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.

In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().

Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).

Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com
Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com
---
 kernel/time/tick-common.c | 42 ++++++++++++--------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b130..a47bcf7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
 	}
 }
 
-#ifdef CONFIG_NO_HZ_FULL
-static void giveup_do_timer(void *info)
-{
-	int cpu = *(unsigned int *)info;
-
-	WARN_ON(tick_do_timer_cpu != smp_processor_id());
-
-	tick_do_timer_cpu = cpu;
-}
-
-static void tick_take_do_timer_from_boot(void)
-{
-	int cpu = smp_processor_id();
-	int from = tick_do_timer_boot_cpu;
-
-	if (from >= 0 && from != cpu)
-		smp_call_function_single(from, giveup_do_timer, &cpu, 1);
-}
-#endif
-
 /*
  * Setup the tick device
  */
@@ -221,19 +201,25 @@ static void tick_setup_device(struct tick_device *td,
 			tick_next_period = ktime_get();
 #ifdef CONFIG_NO_HZ_FULL
 			/*
-			 * The boot CPU may be nohz_full, in which case set
-			 * tick_do_timer_boot_cpu so the first housekeeping
-			 * secondary that comes up will take do_timer from
-			 * us.
+			 * The boot CPU may be nohz_full, in which case the
+			 * first housekeeping secondary will take do_timer()
+			 * from it.
 			 */
 			if (tick_nohz_full_cpu(cpu))
 				tick_do_timer_boot_cpu = cpu;
 
-		} else if (tick_do_timer_boot_cpu != -1 &&
-						!tick_nohz_full_cpu(cpu)) {
-			tick_take_do_timer_from_boot();
+		} else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) {
 			tick_do_timer_boot_cpu = -1;
-			WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu);
+			/*
+			 * The boot CPU will stay in periodic (NOHZ disabled)
+			 * mode until clocksource_done_booting() called after
+			 * smp_init() selects a high resolution clocksource and
+			 * timekeeping_notify() kicks the NOHZ stuff alive.
+			 *
+			 * So this WRITE_ONCE can only race with the READ_ONCE
+			 * check in tick_periodic() but this race is harmless.
+			 */
+			WRITE_ONCE(tick_do_timer_cpu, cpu);
 #endif
 		}
Re: [tip: timers/urgent] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
Posted by Frederic Weisbecker 1 year, 8 months ago
Le Mon, Jun 10, 2024 at 06:26:21PM -0000, tip-bot2 for Oleg Nesterov a écrit :
> The following commit has been merged into the timers/urgent branch of tip:
> 
> Commit-ID:     07c54cc5988f19c9642fd463c2dbdac7fc52f777
> Gitweb:        https://git.kernel.org/tip/07c54cc5988f19c9642fd463c2dbdac7fc52f777
> Author:        Oleg Nesterov <oleg@redhat.com>
> AuthorDate:    Tue, 28 May 2024 14:20:19 +02:00
> Committer:     Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Mon, 10 Jun 2024 20:18:13 +02:00
> 
> tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
> 
> After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
> when the boot CPU is nohz_full") the kernel no longer crashes, but there is
> another problem.
> 
> In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
> update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
> in smp_call_function_single().
> 
> Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
> comment explains why this is safe (thanks Thomas!).
> 
> Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com
> Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com

I think we agreed on that version actually:

https://lore.kernel.org/all/20240603153557.GA8311@redhat.com/

Thanks.