From nobody Thu Apr 2 03:25:19 2026 Received: from mail-dy1-f177.google.com (mail-dy1-f177.google.com [74.125.82.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6EB83431E3 for ; Wed, 25 Mar 2026 09:10:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774429861; cv=none; b=ByZM1i58FPAdfcV0hCLX5Os3GzEft6Gp62jpJa/URx/Jtpz5yovtuSzAT4GbvRORQ18Nq9ZlH298Too0yXA+J2T78ERzEWaibnqiljLnNkfj7VL1+J5k2JkbAQ/K6cT+6zeuWlOEd1IfPTge/tGmUdZR+QnEQX0jECVHswgFiB8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774429861; c=relaxed/simple; bh=dZRPxCW29l/fu7AJjzApS7HpjEpQcwXBEnvE4/krAJA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=KqJ+mMeTLy8e11ANirvkjbm9Qrz6oSzg1iPwHUh/hdA9SrXh2JT5c7g9/5150jYvL/I9MaUEroIASIW5WhyTiwQsn3dodtN8TwiOUl77/ka4fsWJIfffgI35VxP81wd60EXGDobaRnCR8QYDgr4lUTylmLyRq+wUC7gP8v+0LB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HegLohDh; arc=none smtp.client-ip=74.125.82.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HegLohDh" Received: by mail-dy1-f177.google.com with SMTP id 5a478bee46e88-2c1092cc08cso2896853eec.1 for ; Wed, 25 Mar 2026 02:10:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774429859; x=1775034659; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=j696bvYoqH2TUY0iU+oTg40Ft0YnS+LfzToj8hbXNYs=; b=HegLohDha3eWmDRYz+hahj2NwKxXbH6W7eS5zL1nqNLs1OHcQk77khFA1BNuosAx89 3qPbPC69Wu53mTYie2F8dtuyPNdQX6jUgg1lOkQAWOG8TC6lx5TSO/ypirMxlJ8mgyCs cM8olWSwaeuaT73Dov8oDvjxaaNIrC655tXyMqtV+rbrLJEQV1S0YBgiPzY+YVCtfjtc SxJvR9PqYopMulnsqVPiefcjMavFtXkCSXa4qC/Gh5P2WeEuXhL8dtnVe4e06HhQRpOs 6mIojhVAWOQaYBDGyxOr4YyzXmf6E4PEQvqs+g4x/xUepNsQio2dk+HXdU5CB4J677xj 4JXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774429859; x=1775034659; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=j696bvYoqH2TUY0iU+oTg40Ft0YnS+LfzToj8hbXNYs=; b=c9tEN8Oq0FQBGT0mid8r8ZEIjA8nMX774ZfSwAwkfPKOhTQgOJBMPuQ/x0+9MdUfG6 W7Ys4kSgURO5mcTuKctXPu7hBMy6flkxCl79pJk10qfemDxItDP5PcxY/GlmPUDnd0Sx utdP2mVbwU9RdkcMWOMJA0nVuFl59NNN2G51RTA8IeK3eRnbaiDBapgEezhrIDoqmQt6 0/leiofs1cquCQu1gSmMnBArf8SSCG3xl0mufSnP5lSS8/MMhrPAb4FcGqFj6HP8IHAG YCTl0cbhJ2sgSVgbLygOJHhOUWPE3hTBLjQiYq3WVzOW/QHG/kTL3OC+YO8pOy+X1EY2 WiNQ== X-Gm-Message-State: AOJu0Yw3bMXtGtl8ntKlggS1Kc2ArGpwKiO2BQioAwiQHLchKvoA+H6/ mR0NH230hGrbX3hPiGQbQVN+AtUR8heAgZQEc2fovl03tUUbng4IFExJ X-Gm-Gg: ATEYQzz57KwmpmyUmkJ9T85WIP9xeStpRHLL7y13ZZoEQC6fl4Or4ytUojnQ9ZEfNai 27shLNWPnjf5r2dPQQxXI96msO6C3Gy5cAYgKJkqtNHFGmI5+kEp9S3pEfyY4a+LmxaRJP7cFQB gwJOlDTruqcDWZzMDxEsj9jIwk+GILJVBJ2pZohQ87B+uazwANfWae7Tu5Ssq37v54ABs9b7IF7 9PZSL5eparo5sg+kVOGkdvFfxswr0mSWtzkFC22bb4G4TAveQcDRpe8wMNYsGx3nGW6lWjiwlAN 55JFh5t/jecbEL8F1KlMABA/hKXDCYYqUdtleQ4VDQcWrabNpd5yUm7BKOfgknznJFQJslGDlux spm991s+KlMsqln0Ax0EBXVunxLBfnO+pTSA91d3iIvYEtyAqIUICMxFTWks931u3IAlncKfuL9 QjKqXk5KNXcc8mqQaz X-Received: by 2002:a05:7300:cb0e:b0:2be:8216:57c8 with SMTP id 5a478bee46e88-2c15d4a74bdmr1207196eec.30.1774429859013; Wed, 25 Mar 2026 02:10:59 -0700 (PDT) Received: from wujing. ([74.48.213.230]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2c159e25dc7sm2786389eec.27.2026.03.25.02.10.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2026 02:10:58 -0700 (PDT) From: Qiliang Yuan Date: Wed, 25 Mar 2026 17:09:41 +0800 Subject: [PATCH 10/15] tick/nohz: Transition to dynamic full dynticks state management Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260325-dhei-v12-final-v1-10-919cca23cadf@gmail.com> References: <20260325-dhei-v12-final-v1-0-919cca23cadf@gmail.com> In-Reply-To: <20260325-dhei-v12-final-v1-0-919cca23cadf@gmail.com> To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Thomas Gleixner , "Paul E. McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Tejun Heo , Andrew Morton , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Johannes Weiner , Zi Yan , Anna-Maria Behnsen , Ingo Molnar , Shuah Khan Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Qiliang Yuan X-Mailer: b4 0.13.0 Context: Full dynticks (NOHZ_FULL) is typically a static configuration determined at boot time. DHEI extends this to support runtime activation. Problem: Switching to NOHZ_FULL at runtime requires careful synchronization of context tracking and housekeeping states. Re-invoking setup logic multiple times could lead to inconsistencies or warnings, and RCU dependency checks often prevented tick suppression in "Zero-Conf" setups. Solution: - Replaced the static tick_nohz_full_enabled() checks with a dynamic tick_nohz_full_running state variable. - Refactored tick_nohz_full_setup to be safe for runtime invocation, adding guards against re-initialization and ensuring IRQ work interrupt support. - Implemented boot-time pre-activation of context tracking (shadow init) for all possible CPUs to avoid instruction flow issues during dynamic transitions. - Restored standard rcu_needs_cpu() checks now that RCU supports native dynamic NOCB mode switching. This provides the core state machine for reliable, on-demand tick suppression and high-performance isolation. --- kernel/time/tick-sched.c | 130 ++++++++++++++++++++++++++++++++++++++-----= ---- 1 file changed, 105 insertions(+), 25 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 2f8a7923fa279..dee42cea259a9 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -27,6 +27,7 @@ #include #include #include +#include =20 #include =20 @@ -621,13 +622,25 @@ void __tick_nohz_task_switch(void) /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { - alloc_bootmem_cpumask_var(&tick_nohz_full_mask); + if (!tick_nohz_full_mask) { + if (!slab_is_available()) + alloc_bootmem_cpumask_var(&tick_nohz_full_mask); + else + zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL); + } cpumask_copy(tick_nohz_full_mask, cpumask); tick_nohz_full_running =3D true; } =20 bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { + /* + * Allow all CPUs to go down during shutdown/reboot to avoid + * interfering with the final power-off sequence. + */ + if (system_state > SYSTEM_RUNNING) + return true; + /* * The 'tick_do_timer_cpu' CPU handles housekeeping duty (unbound * timers, workqueues, timekeeping, ...) on behalf of full dynticks @@ -643,45 +656,112 @@ static int tick_nohz_cpu_down(unsigned int cpu) return tick_nohz_cpu_hotpluggable(cpu) ? 0 : -EBUSY; } =20 +static int tick_nohz_housekeeping_reconfigure(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct housekeeping_update *upd =3D data; + int cpu; + + if (action =3D=3D HK_UPDATE_MASK && upd->type =3D=3D HK_TYPE_TICK) { + cpumask_var_t non_housekeeping_mask; + + if (!alloc_cpumask_var(&non_housekeeping_mask, GFP_KERNEL)) + return NOTIFY_BAD; + + cpumask_andnot(non_housekeeping_mask, cpu_possible_mask, upd->new_mask); + + if (!tick_nohz_full_mask) { + if (!zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) { + free_cpumask_var(non_housekeeping_mask); + return NOTIFY_BAD; + } + } + + /* Kick all CPUs to re-evaluate tick dependency before change */ + for_each_online_cpu(cpu) + tick_nohz_full_kick_cpu(cpu); + + cpumask_copy(tick_nohz_full_mask, non_housekeeping_mask); + tick_nohz_full_running =3D !cpumask_empty(tick_nohz_full_mask); + + /* + * If nohz_full is running, the timer duty must be on a housekeeper. + * If the current timer CPU is not a housekeeper, or no duty is assigned, + * pick the first housekeeper and assign it. + */ + if (tick_nohz_full_running) { + int timer_cpu =3D READ_ONCE(tick_do_timer_cpu); + if (timer_cpu =3D=3D TICK_DO_TIMER_NONE || + !cpumask_test_cpu(timer_cpu, upd->new_mask)) { + int next_timer =3D cpumask_first(upd->new_mask); + if (next_timer < nr_cpu_ids) + WRITE_ONCE(tick_do_timer_cpu, next_timer); + } + } + + /* Kick all CPUs again to apply new nohz full state */ + for_each_online_cpu(cpu) + tick_nohz_full_kick_cpu(cpu); + + free_cpumask_var(non_housekeeping_mask); + } + + return NOTIFY_OK; +} + +static struct notifier_block tick_nohz_housekeeping_nb =3D { + .notifier_call =3D tick_nohz_housekeeping_reconfigure, +}; + void __init tick_nohz_init(void) { int cpu, ret; =20 - if (!tick_nohz_full_running) - return; - - /* - * Full dynticks uses IRQ work to drive the tick rescheduling on safe - * locking contexts. But then we need IRQ work to raise its own - * interrupts to avoid circular dependency on the tick. - */ - if (!arch_irq_work_has_interrupt()) { - pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ= work self-IPIs\n"); - cpumask_clear(tick_nohz_full_mask); - tick_nohz_full_running =3D false; - return; + if (!tick_nohz_full_mask) { + if (!slab_is_available()) + alloc_bootmem_cpumask_var(&tick_nohz_full_mask); + else + zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL); } =20 - if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) && - !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) { - cpu =3D smp_processor_id(); + housekeeping_register_notifier(&tick_nohz_housekeeping_nb); =20 - if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { - pr_warn("NO_HZ: Clearing %d from nohz_full range " - "for timekeeping\n", cpu); - cpumask_clear_cpu(cpu, tick_nohz_full_mask); + if (tick_nohz_full_running) { + /* + * Full dynticks uses IRQ work to drive the tick rescheduling on safe + * locking contexts. But then we need IRQ work to raise its own + * interrupts to avoid circular dependency on the tick. + */ + if (!arch_irq_work_has_interrupt()) { + pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IR= Q work self-IPIs\n"); + cpumask_clear(tick_nohz_full_mask); + tick_nohz_full_running =3D false; + goto out; } + + if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) && + !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) { + cpu =3D smp_processor_id(); + + if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) { + pr_warn("NO_HZ: Clearing %d from nohz_full range " + "for timekeeping\n", cpu); + cpumask_clear_cpu(cpu, tick_nohz_full_mask); + } + } + + pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n", + cpumask_pr_args(tick_nohz_full_mask)); } =20 - for_each_cpu(cpu, tick_nohz_full_mask) +out: + for_each_possible_cpu(cpu) ct_cpu_track_user(cpu); =20 ret =3D cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "kernel/nohz:predown", NULL, tick_nohz_cpu_down); WARN_ON(ret < 0); - pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n", - cpumask_pr_args(tick_nohz_full_mask)); } #endif /* #ifdef CONFIG_NO_HZ_FULL */ =20 @@ -1200,7 +1280,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_s= ched *ts) if (unlikely(report_idle_softirq())) return false; =20 - if (tick_nohz_full_enabled()) { + if (tick_nohz_full_running) { int tick_cpu =3D READ_ONCE(tick_do_timer_cpu); =20 /* --=20 2.43.0