This patch is for debug only.
Warning is triggerred when CPU is in this state:
1) tick was already stopped before tick_nohz_idle_stop_tick()
stops the tick
2) and CPU is not in nohz.idle_cpus_mask
3) and CPU is idle
4) and tick is stopped
CPU will stay idle in this state, since neither the periodic nor
the NOHZ idle load balancing can move task to this CPU.
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
include/linux/sched/nohz.h | 2 ++
kernel/sched/fair.c | 5 +++++
kernel/time/tick-sched.c | 3 ++-
3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 0db7f67935fe..ea6e07777395 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -9,8 +9,10 @@
#ifdef CONFIG_NO_HZ_COMMON
extern void nohz_balance_enter_idle(int cpu);
extern int get_nohz_timer_target(void);
+extern bool nohz_balance_idle_cpu(int cpu);
#else
static inline void nohz_balance_enter_idle(int cpu) { }
+static inline bool nohz_balance_idle_cpu(int cpu) { return false; }
#endif
#ifdef CONFIG_NO_HZ_COMMON
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..cd1c17368e05 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7109,6 +7109,11 @@ static struct {
unsigned long next_blocked; /* Next update of blocked load in jiffies */
} nohz ____cacheline_aligned;
+inline bool nohz_balance_idle_cpu(int cpu)
+{
+ return cpumask_test_cpu(cpu, nohz.idle_cpus_mask);
+}
+
#endif /* CONFIG_NO_HZ_COMMON */
static unsigned long cpu_load(struct rq *rq)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b900a120ab54..8241b14842f3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1228,7 +1228,8 @@ void tick_nohz_idle_stop_tick(void)
ts->idle_sleeps++;
ts->idle_expires = expires;
-
+ WARN_ON_ONCE(was_stopped && !nohz_balance_idle_cpu(cpu) &&
+ idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu));
if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
if (!was_stopped)
ts->idle_jiffies = ts->last_jiffies;
--
2.34.1
Hello, we are not sure the purpose of this debug patch so not sure if below report could supply any useful information. just report FYI. if low value, please just kindly ignore. thanks kernel test robot noticed "WARNING:at_kernel/time/tick-sched.c:#tick_nohz_idle_stop_tick" on: commit: d112a298a9368568686e1a399cc5073a02f60c2f ("[PATCH RESEND 2/2] tick/nohz: Trigger warning when CPU in wrong NOHZ idle state") url: https://github.com/intel-lab-lkp/linux/commits/Adam-Li/tick-nohz-Fix-wrong-NOHZ-idle-CPU-state/20250821-122906 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 1b5f1454091e9e9fb5c944b3161acf4ec0894d0d patch link: https://lore.kernel.org/all/20250821042707.62993-3-adamli@os.amperecomputing.com/ patch subject: [PATCH RESEND 2/2] tick/nohz: Trigger warning when CPU in wrong NOHZ idle state in testcase: cpu-hotplug version: with following parameters: config: x86_64-rhel-9.4-func compiler: gcc-12 test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202509031529.464ff656-lkp@intel.com [ 96.913433][ T0] ------------[ cut here ]------------ [ 43.020727][ T307] [ 87.665815][ T0] Masked ExtINT on CPU#3 [ 43.030461][ T307] (Reading database ... 16595 files and directories currently installed.) [ 89.314224][ T252] smpboot: CPU 1 is now offline [ 96.918719][ T0] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:1231 tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:1231) [ 43.030468][ T307] [ 87.817541][ T649] smpboot: Booting Node 0 Processor 4 APIC 0x1 [ 43.042435][ T307] Preparing to unpack .../ntpdate_1%3a4.2.8p15+dfsg-2~1.2.2+dfsg1-1+deb12u1_all.deb ... [ 90.352309][ T252] smpboot: CPU 2 is now offline [ 43.042441][ T307] [ 96.928830][ T0] Modules linked in: snd_hda_codec_hdmi snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component btrfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common blake2b_generic xor zstd_compress snd_soc_avs x86_pkg_temp_thermal snd_soc_hda_codec snd_hda_ext_core intel_powerclamp raid6_pq snd_soc_core coretemp snd_compress i915 sd_mod snd_hda_intel sg snd_intel_dspcfg kvm_intel intel_gtt ipmi_devintf snd_intel_sdw_acpi ipmi_msghandler cec snd_hda_codec drm_buddy kvm snd_hda_core ttm mei_wdt irqbypass snd_hwdep i2c_designware_platform ghash_clmulni_intel snd_pcm dell_wmi platform_profile ahci rapl drm_display_helper i2c_designware_core mei_me dcdbas libahci dell_wmi_aio snd_timer dell_smbios intel_cstate drm_client_lib dell_smm_hwmon wmi_bmof dell_wmi_descriptor sparse_keymap intel_lpss_pci snd drm_kms_helper i2c_i801 intel_lpss libata pcspkr intel_uncore mei idma64 soundcore i2c_smbus video intel_pmc_core pmt_telemetry pmt_class wmi pinctrl_sunrisepoint [ 87.823787][ T0] Masked ExtINT on CPU#4 [ 43.055242][ T307] Unpacking ntpdate (1:4.2.8p15+dfsg-2~1.2.2+dfsg1-1+deb12u1) ... [ 91.391058][ T252] smpboot: CPU 3 is now offline [ 96.928957][ T0] intel_pmc_ssram_telemetry intel_vsec acpi_pad binfmt_misc drm fuse loop dm_mod ip_tables [ 43.055248][ T307] [ 87.976451][ T649] smpboot: Booting Node 0 Processor 5 APIC 0x3 [ 43.065976][ T307] Selecting previously unselected package ntpsec-ntpdate. [ 92.424836][ T252] smpboot: CPU 4 is now offline [ 97.028568][ T0] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G I 6.16.0-rc6-00089-gd112a298a936 #1 PREEMPT(voluntary) [ 43.065982][ T307] [ 87.982768][ T0] Masked ExtINT on CPU#5 [ 43.076378][ T307] Preparing to unpack .../ntpsec-ntpdate_1.2.2+dfsg1-1+deb12u1_amd64.deb ... [ 97.041089][ T0] Tainted: [I]=FIRMWARE_WORKAROUND [ 93.454857][ T252] smpboot: CPU 5 is now offline [ 43.076384][ T307] [ 88.118542][ T649] smpboot: Booting Node 0 Processor 6 APIC 0x5 [ 43.087987][ T307] Unpacking ntpsec-ntpdate (1.2.2+dfsg1-1+deb12u1) ... [ 94.507924][ T252] smpboot: CPU 6 is now offline [ 97.046019][ T0] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016 [ 43.087993][ T307] [ 88.124813][ T0] Masked ExtINT on CPU#6 [ 43.097723][ T307] Selecting previously unselected package python3-ntp. [ 95.537790][ T252] smpboot: CPU 7 is now offline [ 97.054053][ T0] RIP: 0010:tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:1231) [ 43.097729][ T307] [ 88.268535][ T649] smpboot: Booting Node 0 Processor 7 APIC 0x7 [ 43.107761][ T307] Preparing to unpack .../python3-ntp_1.2.2+dfsg1-1+deb12u1_amd64.deb ... [ 97.060194][ T0] Code: e0 5d f6 84 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 80 3c 02 00 0f 85 b7 00 00 00 41 f6 04 24 02 0f 84 63 ff ff ff <0f> 0b e9 5c ff ff ff 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 All code ======== 0: e0 5d loopne 0x5f 2: f6 84 48 b8 00 00 00 testb $0x0,0xb8(%rax,%rcx,2) 9: 00 a: 00 fc add %bh,%ah c: ff (bad) d: df 4c 89 e2 fisttps -0x1e(%rcx,%rcx,4) 11: 48 c1 ea 03 shr $0x3,%rdx 15: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 19: 0f 85 b7 00 00 00 jne 0xd6 1f: 41 f6 04 24 02 testb $0x2,(%r12) 24: 0f 84 63 ff ff ff je 0xffffffffffffff8d 2a:* 0f 0b ud2 <-- trapping instruction 2c: e9 5c ff ff ff jmp 0xffffffffffffff8d 31: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 38: fc ff df 3b: 48 89 da mov %rbx,%rdx 3e: 48 rex.W 3f: c1 .byte 0xc1 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 5c ff ff ff jmp 0xffffffffffffff63 7: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax e: fc ff df 11: 48 89 da mov %rbx,%rdx 14: 48 rex.W 15: c1 .byte 0xc1 [ 95.560557][ T649] smpboot: Booting Node 0 Processor 1 APIC 0x2 [ 43.107767][ T307] [ 88.274805][ T0] Masked ExtINT on CPU#7 [ 43.119061][ T307] Unpacking python3-ntp (1.2.2+dfsg1-1+deb12u1) ... [ 95.566801][ T0] Masked ExtINT on CPU#1 [ 97.079531][ T0] RSP: 0018:ffffffff85207df0 EFLAGS: 00010002 [ 43.119067][ T307] [ 89.314224][ T252] smpboot: CPU 1 is now offline [ 43.128509][ T307] Setting up python3-ntp (1.2.2+dfsg1-1+deb12u1) ... [ 95.591432][ T649] smpboot: Booting Node 0 Processor 2 APIC 0x4 [ 97.085419][ T0] RAX: dffffc0000000000 RBX: ffff888796430c40 RCX: 0000000000000000 [ 43.128515][ T307] [ 90.352309][ T252] smpboot: CPU 2 is now offline [ 43.138294][ T307] Setting up ntpdate (1:4.2.8p15+dfsg-2~1.2.2+dfsg1-1+deb12u1) ... [ 95.597695][ T0] Masked ExtINT on CPU#2 [ 97.093208][ T0] RDX: 1ffff110f2c86188 RSI: 0000000000000008 RDI: ffffffff84f65de0 [ 43.138300][ T307] [ 91.391058][ T252] smpboot: CPU 3 is now offline [ 43.149075][ T307] Setting up ntpsec-ntpdate (1.2.2+dfsg1-1+deb12u1) ... [ 95.625417][ T649] smpboot: Booting Node 0 Processor 3 APIC 0x6 [ 97.100997][ T0] RBP: ffff88880f074000 R08: 0000000000000000 R09: ffffed10219177b8 [ 43.149081][ T307] [ 92.424836][ T252] smpboot: CPU 4 is now offline [ 43.158251][ T307] NO_NETWORK= [ 97.108775][ T0] R10: ffff88810c8bbdc7 R11: ffff888796430c70 R12: ffff888796430c40 [ 95.631679][ T0] Masked ExtINT on CPU#3 [ 43.158257][ T307] [ 93.454857][ T252] smpboot: CPU 5 is now offline [ 43.164205][ T307] CLOCK: time stepped by 32064468.825994 [ 95.675523][ T649] smpboot: Booting Node 0 Processor 4 APIC 0x1 [ 97.116551][ T0] R13: 0000000000000000 R14: 0000000000000000 R15: 000000081dff1000 [ 43.164212][ T307] [ 94.507924][ T252] smpboot: CPU 6 is now offline [ 43.172683][ T307] CLOCK: time changed from 2024-08-25 to 2025-08-31 [ 97.124328][ T0] FS: 0000000000000000(0000) GS:ffff88880f074000(0000) knlGS:0000000000000000 [ 95.681842][ T0] Masked ExtINT on CPU#4 [ 43.172690][ T307] [ 95.537790][ T252] smpboot: CPU 7 is now offline [ 43.183164][ T307] 2025-08-31 20:23:30.162487 (+0000) +32064468.825994 +/- 0.000310 internal-lkp-server 192.168.1.200 s5 no-leap [ 95.721433][ T649] smpboot: Booting Node 0 Processor 5 APIC 0x3 [ 97.133051][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.183170][ T307] [ 95.560557][ T649] smpboot: Booting Node 0 Processor 1 APIC 0x2 [ 43.221647][ T10] smpboot: CPU 6 is now offline [ 97.139448][ T0] CR2: 00007fe2bc1c1e90 CR3: 000000081aa72006 CR4: 00000000003726f0 [ 95.727679][ T0] Masked ExtINT on CPU#5 [ 44.252774][ T10] smpboot: CPU 7 is now offline [ 95.566801][ T0] Masked ExtINT on CPU#1 [ 44.289262][ T649] smpboot: Booting Node 0 Processor 1 APIC 0x2 [ 97.147225][ T0] Call Trace: [ 95.761430][ T649] smpboot: Booting Node 0 Processor 6 APIC 0x5 [ 44.301168][ T0] Masked ExtINT on CPU#1 [ 95.591432][ T649] smpboot: Booting Node 0 Processor 2 APIC 0x4 [ 44.323288][ T649] smpboot: Booting Node 0 Processor 2 APIC 0x4 [ 97.150345][ T0] <TASK> [ 44.329546][ T0] Masked ExtINT on CPU#2 [ 95.767675][ T0] Masked ExtINT on CPU#6 [ 95.597695][ T0] Masked ExtINT on CPU#2 [ 44.357295][ T649] smpboot: Booting Node 0 Processor 3 APIC 0x6 [ 97.153121][ T0] cpuidle_idle_call (arch/x86/include/asm/current.h:25 include/linux/sched/idle.h:33 include/linux/sched/idle.h:67 kernel/sched/idle.c:149 kernel/sched/idle.c:235) [ 95.808544][ T649] smpboot: Booting Node 0 Processor 7 APIC 0x7 [ 44.363568][ T0] Masked ExtINT on CPU#3 [ 95.625417][ T649] smpboot: Booting Node 0 Processor 3 APIC 0x6 [ 44.402300][ T649] smpboot: Booting Node 0 Processor 4 APIC 0x1 [ 97.157882][ T0] ? __pfx_cpuidle_idle_call (kernel/sched/idle.c:173) [ 95.814793][ T0] Masked ExtINT on CPU#7 [ 44.408556][ T0] Masked ExtINT on CPU#4 [ 95.631679][ T0] Masked ExtINT on CPU#3 [ 44.445282][ T649] smpboot: Booting Node 0 Processor 5 APIC 0x3 [ 97.163158][ T0] ? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:60) [ 95.965121][ T252] smpboot: CPU 1 is now offline [ 44.451534][ T0] Masked ExtINT on CPU#5 [ 95.675523][ T649] smpboot: Booting Node 0 Processor 4 APIC 0x1 [ 44.491303][ T649] smpboot: Booting Node 0 Processor 6 APIC 0x5 [ 97.168350][ T0] do_idle (kernel/sched/idle.c:330) [ 96.109316][ T252] smpboot: CPU 2 is now offline [ 44.497553][ T0] Masked ExtINT on CPU#6 [ 95.681842][ T0] Masked ExtINT on CPU#4 [ 44.536391][ T649] smpboot: Booting Node 0 Processor 7 APIC 0x7 [ 97.172160][ T0] cpu_startup_entry (kernel/sched/idle.c:427 (discriminator 1)) [ 96.244243][ T252] smpboot: CPU 3 is now offline [ 44.542639][ T0] Masked ExtINT on CPU#7 [ 95.721433][ T649] smpboot: Booting Node 0 Processor 5 APIC 0x3 [ 44.699078][ T10] smpboot: CPU 1 is now offline [ 97.176746][ T0] rest_init (init/main.c:718) [ 96.397668][ T252] smpboot: CPU 4 is now offline The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250903/202509031529.464ff656-lkp@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
© 2016 - 2025 Red Hat, Inc.