Hello,
by this commit, we notice big config diff [1]
then in this rcutorture tests, parent runs quite clean, f388f60ca9 shows
various random issues.
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/i386-randconfig-r071-20250410/gcc-12/300s/default/tasks-tracing
fc2d5cbe541032e7 f388f60ca9041a95c9b3f157d31
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:500 30% 149:500 last_state.booting
:500 7% 35:500 dmesg.BUG:kernel_hang_in_boot_stage
:500 9% 45:500 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
:500 10% 51:500 dmesg.BUG:workqueue_lockup-pool
:500 0% 1:500 dmesg.EIP:__timer_delete_sync
:500 1% 5:500 dmesg.EIP:_raw_spin_unlock_irq
:500 0% 2:500 dmesg.EIP:_raw_spin_unlock_irqrestore
:500 0% 1:500 dmesg.EIP:console_emit_next_record
:500 0% 1:500 dmesg.EIP:handle_softirqs
:500 1% 3:500 dmesg.EIP:lock_acquire
:500 0% 2:500 dmesg.EIP:lock_release
:500 0% 1:500 dmesg.EIP:queue_delayed_work_on
:500 9% 45:500 dmesg.EIP:timekeeping_notify
:500 3% 14:500 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
:500 6% 32:500 dmesg.INFO:task_blocked_for_more_than#seconds
:500 1% 3:500 dmesg.IP-Config:Auto-configuration_of_network_failed
:500 9% 45:500 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
:500 29% 146:500 dmesg.boot_failures
we don't have enough knowledge to dig deep these issues. so just make this
report to consult with you if these issues are related with config diff.
and if so, is this config diff reasonable by this commit?
below our normal report just FYI.
kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![swapper:#]" on:
commit: f388f60ca9041a95c9b3f157d316ed7c8f297e44 ("x86/cpu: Drop configuration options for early 64-bit CPUs")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master e618ee89561b6b0fdc69f79e6fd0c33375d3e6b4]
[test failed on linux-next/master 01c6df60d5d4ae00cd5c1648818744838bba7763]
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: default
torture_type: tasks-tracing
config: i386-randconfig-r071-20250410
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202504211553.3ba9400-lkp@intel.com
[ 721.016745][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper:1]
[ 721.016779][ C0] CPU#0 Utilization every 96s during lockup:
[ 721.016779][ C0] #1: 39% system, 0% softirq, 0% hardirq, 0% idle
[ 721.016779][ C0] #2: 42% system, 0% softirq, 0% hardirq, 0% idle
[ 721.016779][ C0] #3: 47% system, 0% softirq, 0% hardirq, 0% idle
[ 721.016779][ C0] #4: 34% system, 0% softirq, 0% hardirq, 0% idle
[ 721.016779][ C0] #5: 32% system, 0% softirq, 0% hardirq, 0% idle
[ 721.016779][ C0] Modules linked in:
[ 721.016779][ C0] irq event stamp: 159506
[ 721.016779][ C0] hardirqs last enabled at (159505): timekeeping_notify (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:97 arch/x86/include/asm/irqflags.h:155 include/linux/stop_machine.h:154 include/linux/stop_machine.h:161 kernel/time/timekeeping.c:1521)
[ 721.016779][ C0] hardirqs last disabled at (159506): sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs (kernel/softirq.c:408 kernel/softirq.c:589)
[ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq (kernel/softirq.c:596)
[ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.14.0-rc3-00037-gf388f60ca904 #1
[ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 721.016779][ C0] EIP: timekeeping_notify (kernel/time/timekeeping.c:1522)
[ 721.016779][ C0] Code: 5f e9 ff ff 8d 45 e8 e8 57 d4 ff ff 85 ff 74 16 8b 57 5c 85 d2 74 04 89 f8 ff d2 8b 87 88 00 00 00 e8 d5 3e ff ff 85 f6 75 9b <e8> 7f b9 00 00 31 c0 39 1d a4 70 14 84 0f 95 c0 f7 d8 8b 55 f0 2b
All code
========
0: 5f pop %rdi
1: e9 ff ff 8d 45 jmp 0x458e0005
6: e8 e8 57 d4 ff call 0xffffffffffd457f3
b: ff 85 ff 74 16 8b incl -0x74e98b01(%rbp)
11: 57 push %rdi
12: 5c pop %rsp
13: 85 d2 test %edx,%edx
15: 74 04 je 0x1b
17: 89 f8 mov %edi,%eax
19: ff d2 call *%rdx
1b: 8b 87 88 00 00 00 mov 0x88(%rdi),%eax
21: e8 d5 3e ff ff call 0xffffffffffff3efb
26: 85 f6 test %esi,%esi
28: 75 9b jne 0xffffffffffffffc5
2a:* e8 7f b9 00 00 call 0xb9ae <-- trapping instruction
2f: 31 c0 xor %eax,%eax
31: 39 1d a4 70 14 84 cmp %ebx,-0x7beb8f5c(%rip) # 0xffffffff841470db
37: 0f 95 c0 setne %al
3a: f7 d8 neg %eax
3c: 8b 55 f0 mov -0x10(%rbp),%edx
3f: 2b .byte 0x2b
Code starting with the faulting instruction
===========================================
0: e8 7f b9 00 00 call 0xb984
5: 31 c0 xor %eax,%eax
7: 39 1d a4 70 14 84 cmp %ebx,-0x7beb8f5c(%rip) # 0xffffffff841470b1
d: 0f 95 c0 setne %al
10: f7 d8 neg %eax
12: 8b 55 f0 mov -0x10(%rbp),%edx
15: 2b .byte 0x2b
[ 721.016779][ C0] EAX: 00026f11 EBX: 8316b7e0 ECX: 00000006 EDX: 7e26f13f
[ 721.016779][ C0] ESI: 00000200 EDI: 835e7220 EBP: 86d15ed8 ESP: 86d15ec0
[ 721.016779][ C0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000206
[ 721.016779][ C0] CR0: 80050033 CR2: ffdaa000 CR3: 03a16000 CR4: 000406d0
[ 721.016779][ C0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 721.016779][ C0] DR6: fffe0ff0 DR7: 00000400
[ 721.016779][ C0] Call Trace:
[ 721.016779][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:478)
[ 721.016779][ C0] ? watchdog_timer_fn (kernel/watchdog.c:767)
[ 721.016779][ C0] ? schedule_work (drivers/usb/core/hub.c:919)
[ 721.016779][ C0] ? __hrtimer_run_queues+0x12f/0x1cf
[ 721.016779][ C0] ? hrtimer_run_queues (kernel/time/hrtimer.c:2023)
[ 721.016779][ C0] ? update_process_times (kernel/time/timer.c:2458 kernel/time/timer.c:2514)
[ 721.016779][ C0] ? tick_periodic (kernel/time/tick-common.c:103)
[ 721.016779][ C0] ? tick_handle_periodic (kernel/time/tick-common.c:144)
[ 721.016779][ C0] ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1056)
[ 721.016779][ C0] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] ? handle_exception (arch/x86/entry/entry_32.S:1055)
[ 721.016779][ C0] ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] ? timekeeping_notify (kernel/time/timekeeping.c:1522)
[ 721.016779][ C0] ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] ? timekeeping_notify (kernel/time/timekeeping.c:1522)
[ 721.016779][ C0] __clocksource_select (kernel/time/clocksource.c:1077 (discriminator 1))
[ 721.016779][ C0] ? boot_override_clock (kernel/time/clocksource.c:1109)
[ 721.016779][ C0] clocksource_select (kernel/time/clocksource.c:1094)
[ 721.016779][ C0] clocksource_done_booting (kernel/time/clocksource.c:1118)
[ 721.016779][ C0] do_one_initcall (init/main.c:1257)
[ 721.016779][ C0] ? rdinit_setup (init/main.c:1305)
[ 721.016779][ C0] do_initcalls (init/main.c:1318 init/main.c:1335)
[ 721.016779][ C0] ? rest_init (init/main.c:1449)
[ 721.016779][ C0] kernel_init_freeable (init/main.c:1572)
[ 721.016779][ C0] kernel_init (init/main.c:1459)
[ 721.016779][ C0] ret_from_fork (arch/x86/kernel/process.c:154)
[ 721.016779][ C0] ? rest_init (init/main.c:1449)
[ 721.016779][ C0] ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
[ 721.016779][ C0] entry_INT80_32 (arch/x86/entry/entry_32.S:945)
[ 721.016779][ C0] Kernel panic - not syncing: softlockup: hung tasks
[ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Tainted: G L 6.14.0-rc3-00037-gf388f60ca904 #1
[ 721.016779][ C0] Tainted: [L]=SOFTLOCKUP
[ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 721.016779][ C0] Call Trace:
[ 721.016779][ C0] dump_stack_lvl (lib/dump_stack.c:124)
[ 721.016779][ C0] dump_stack (lib/dump_stack.c:130)
[ 721.016779][ C0] panic (kernel/panic.c:258 kernel/panic.c:375)
[ 721.016779][ C0] watchdog_timer_fn (kernel/watchdog.c:740)
[ 721.016779][ C0] ? schedule_work (drivers/usb/core/hub.c:919)
[ 721.016779][ C0] __hrtimer_run_queues+0x12f/0x1cf
[ 721.016779][ C0] hrtimer_run_queues (kernel/time/hrtimer.c:2023)
[ 721.016779][ C0] update_process_times (kernel/time/timer.c:2458 kernel/time/timer.c:2514)
[ 721.016779][ C0] tick_periodic (kernel/time/tick-common.c:103)
[ 721.016779][ C0] tick_handle_periodic (kernel/time/tick-common.c:144)
[ 721.016779][ C0] ? vmware_sched_clock (arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1056)
[ 721.016779][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
[ 721.016779][ C0] handle_exception (arch/x86/entry/entry_32.S:1055)
[ 721.016779][ C0] EIP: timekeeping_notify (kernel/time/timekeeping.c:1522)
[ 721.016779][ C0] Code: 5f e9 ff ff 8d 45 e8 e8 57 d4 ff ff 85 ff 74 16 8b 57 5c 85 d2 74 04 89 f8 ff d2 8b 87 88 00 00 00 e8 d5 3e ff ff 85 f6 75 9b <e8> 7f b9 00 00 31 c0 39 1d a4 70 14 84 0f 95 c0 f7 d8 8b 55 f0 2b
All code
========
0: 5f pop %rdi
1: e9 ff ff 8d 45 jmp 0x458e0005
6: e8 e8 57 d4 ff call 0xffffffffffd457f3
b: ff 85 ff 74 16 8b incl -0x74e98b01(%rbp)
11: 57 push %rdi
12: 5c pop %rsp
13: 85 d2 test %edx,%edx
15: 74 04 je 0x1b
17: 89 f8 mov %edi,%eax
19: ff d2 call *%rdx
1b: 8b 87 88 00 00 00 mov 0x88(%rdi),%eax
21: e8 d5 3e ff ff call 0xffffffffffff3efb
26: 85 f6 test %esi,%esi
28: 75 9b jne 0xffffffffffffffc5
2a:* e8 7f b9 00 00 call 0xb9ae <-- trapping instruction
2f: 31 c0 xor %eax,%eax
31: 39 1d a4 70 14 84 cmp %ebx,-0x7beb8f5c(%rip) # 0xffffffff841470db
37: 0f 95 c0 setne %al
3a: f7 d8 neg %eax
3c: 8b 55 f0 mov -0x10(%rbp),%edx
3f: 2b .byte 0x2b
Code starting with the faulting instruction
===========================================
0: e8 7f b9 00 00 call 0xb984
5: 31 c0 xor %eax,%eax
7: 39 1d a4 70 14 84 cmp %ebx,-0x7beb8f5c(%rip) # 0xffffffff841470b1
d: 0f 95 c0 setne %al
10: f7 d8 neg %eax
12: 8b 55 f0 mov -0x10(%rbp),%edx
15: 2b .byte 0x2b
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250421/202504211553.3ba9400-lkp@intel.com
[1]
--- /pkg/linux/i386-randconfig-r071-20250410/gcc-12/fc2d5cbe541032e74a66599ba843803cebbfed0e/.config 2025-04-15 15:41:11.316836213 +0800
+++ /pkg/linux/i386-randconfig-r071-20250410/gcc-12/f388f60ca9041a95c9b3f157d316ed7c8f297e44/.config 2025-04-15 15:41:17.009901645 +0800
@@ -321,7 +321,7 @@ CONFIG_ARCH_CPUIDLE_HALTPOLL=y
# CONFIG_PVH is not set
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y
-# CONFIG_M486SX is not set
+CONFIG_M486SX=y
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
@@ -333,7 +333,6 @@ CONFIG_PARAVIRT_CLOCK=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
-CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
@@ -344,26 +343,24 @@ CONFIG_MK8=y
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
-# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
+CONFIG_X86_F00F_BUG=y
+CONFIG_X86_INVD_BUG=y
+CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
-CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_TSC=y
-CONFIG_X86_HAVE_PAE=y
-CONFIG_X86_CMPXCHG64=y
-CONFIG_X86_CMOV=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=6
-CONFIG_X86_DEBUGCTLMSR=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
+CONFIG_CPU_SUP_CYRIX_32=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_HYGON=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
+CONFIG_CPU_SUP_UMC_32=y
CONFIG_CPU_SUP_ZHAOXIN=y
CONFIG_CPU_SUP_VORTEX_32=y
CONFIG_HPET_TIMER=y
@@ -410,7 +407,6 @@ CONFIG_X86_MSR=y
# CONFIG_X86_CPUID is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
-# CONFIG_HIGHMEM64G is not set
# CONFIG_VMSPLIT_3G is not set
# CONFIG_VMSPLIT_3G_OPT is not set
CONFIG_VMSPLIT_2G=y
@@ -418,7 +414,6 @@ CONFIG_VMSPLIT_2G=y
# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0x80000000
CONFIG_HIGHMEM=y
-# CONFIG_X86_PAE is not set
# CONFIG_X86_CPA_STATISTICS is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
@@ -427,6 +422,7 @@ CONFIG_ILLEGAL_POINTER_VALUE=0
# CONFIG_HIGHPTE is not set
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
+# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
@@ -472,8 +468,8 @@ CONFIG_USE_X86_SEG_SUPPORT=y
CONFIG_CC_HAS_SLS=y
CONFIG_CC_HAS_RETURN_THUNK=y
CONFIG_CC_HAS_ENTRY_PADDING=y
-CONFIG_FUNCTION_PADDING_CFI=0
-CONFIG_FUNCTION_PADDING_BYTES=4
+CONFIG_FUNCTION_PADDING_CFI=11
+CONFIG_FUNCTION_PADDING_BYTES=16
CONFIG_CPU_MITIGATIONS=y
# CONFIG_MITIGATION_RETPOLINE is not set
# CONFIG_MITIGATION_GDS is not set
@@ -741,7 +737,8 @@ CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y
CONFIG_HAVE_GCC_PLUGINS=y
# CONFIG_GCC_PLUGINS is not set
CONFIG_FUNCTION_ALIGNMENT_4B=y
-CONFIG_FUNCTION_ALIGNMENT=4
+CONFIG_FUNCTION_ALIGNMENT_16B=y
+CONFIG_FUNCTION_ALIGNMENT=16
# end of General architecture-dependent options
CONFIG_RT_MUTEXES=y
@@ -1114,7 +1111,6 @@ CONFIG_NFC_SHDLC=y
#
# Near Field Communication (NFC) devices
#
-# CONFIG_NFC_MEI_PHY is not set
# CONFIG_NFC_SIM is not set
# CONFIG_NFC_PORT100 is not set
# CONFIG_NFC_PN544_I2C is not set
@@ -1607,9 +1603,7 @@ CONFIG_EEPROM_IDT_89HPESX=y
# CONFIG_CB710_CORE is not set
CONFIG_SENSORS_LIS3_I2C=y
CONFIG_ALTERA_STAPL=y
-CONFIG_INTEL_MEI=y
-CONFIG_INTEL_MEI_ME=y
-# CONFIG_INTEL_MEI_TXE is not set
+# CONFIG_INTEL_MEI is not set
# CONFIG_VMWARE_VMCI is not set
CONFIG_ECHO=y
# CONFIG_MISC_ALCOR_PCI is not set
@@ -3412,7 +3406,6 @@ CONFIG_TQMX86_WDT=y
CONFIG_W83977F_WDT=y
CONFIG_MACHZ_WDT=y
CONFIG_SBC_EPX_C3_WATCHDOG=y
-# CONFIG_INTEL_MEI_WDT is not set
CONFIG_NI903X_WDT=y
# CONFIG_NIC7018_WDT is not set
# CONFIG_MEN_A21_WDT is not set
@@ -5752,7 +5745,6 @@ CONFIG_GENERIC_NET_UTILS=y
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_IOMAP=y
-CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
@@ -6186,7 +6178,6 @@ CONFIG_SAMPLE_VFIO_MDEV_MDPY=y
CONFIG_SAMPLE_VFIO_MDEV_MBOCHS=y
CONFIG_SAMPLE_ANDROID_BINDERFS=y
CONFIG_SAMPLE_VFS=y
-# CONFIG_SAMPLE_INTEL_MEI is not set
# CONFIG_SAMPLE_TPS6594_PFSM is not set
CONFIG_SAMPLE_WATCHDOG=y
CONFIG_SAMPLE_WATCH_QUEUE=y
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, Apr 21, 2025, at 10:12, kernel test robot wrote:
> Hello,
>
> by this commit, we notice big config diff [1]
>
> then in this rcutorture tests, parent runs quite clean, f388f60ca9 shows
> various random issues.
Thanks for the report!
From my initial reading, my patch most likely caught a preexisting bug,
but my patch itself is correct. It's worth investigating regardless,
at the minimum we should perhaps prevent an invalid configuration from
building or from booting.
> config: i386-randconfig-r071-20250410
Generally, I would not expect 'randconfig' kernels to pass all tests,
and what happened here is that removing the CONFIG_MK8 option made it
pick some completely different CPU
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
The most relevant options here are
-# CONFIG_M486SX is not set
+CONFIG_M486SX=y
# CONFIG_SMP is not set
CONFIG_X86_GENERIC=y
In theory, setting X86_GENERIC should make a kernel built for an
older CPU work on any newer one. In practice, I'm not surprised
that this fails: While AMD K8 is ten years older than Intel Sandy
Bridge, they are architecturally still very similar. The i486SX
is another decade older, but its design is as far removed from
both K8 and Sandy Bridge as it gets.
It would be nice to not have to support 486sx any more.
We have discussed removing support for older CPUs without
TSC, FPU and CX8 in the past, but so far always kept them
around.
> [ 721.016779][ C0] hardirqs last disabled at (159506):
> sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049)
> [ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs
> (kernel/softirq.c:408 kernel/softirq.c:589)
> [ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq
> (kernel/softirq.c:596)
> [ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted
> 6.14.0-rc3-00037-gf388f60ca904 #1
> [ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 721.016779][ C0] EIP: timekeeping_notify
> (kernel/time/timekeeping.c:1522)
Timekeeping code could be related, I see that CONFIG_X86_TSC
is disabled for i486SX configurations, so even if a TSC is present
in the emulated machine, it is not being used to measure time
accurately.
> -CONFIG_X86_CMPXCHG64=y
This could be another issue, if there is code that relies on
the cx8/cmpxchg8b feature to be used. Since this is a non-SMP
kernel, this is less likely to be the cause of the problem.
Can you try what happens when you enable the two options, either
by changing CONFIG_M486SX to CONFIG_M586TSC, or with a patch
like the one below? Note that CONFIG_X86_CMPXCHG64 recently
got renamed to CONFIG_X86_CX8, but they are the exact same thing.
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index f928cf6e3252..ac6cc69060f1 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -317,7 +317,6 @@ config X86_USE_PPRO_CHECKSUM
config X86_TSC
def_bool y
- depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MATOM) || X86_64
config X86_HAVE_PAE
def_bool y
@@ -325,7 +324,6 @@ config X86_HAVE_PAE
config X86_CX8
def_bool y
- depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
# this should be set for all -march=.. options where the compiler
# generates cmov.
Arnd
hi, Arnd,
On Tue, Apr 22, 2025 at 12:16:33PM +0200, Arnd Bergmann wrote:
> On Mon, Apr 21, 2025, at 10:12, kernel test robot wrote:
> > Hello,
> >
> > by this commit, we notice big config diff [1]
> >
> > then in this rcutorture tests, parent runs quite clean, f388f60ca9 shows
> > various random issues.
>
> Thanks for the report!
>
> From my initial reading, my patch most likely caught a preexisting bug,
> but my patch itself is correct. It's worth investigating regardless,
> at the minimum we should perhaps prevent an invalid configuration from
> building or from booting.
>
> > config: i386-randconfig-r071-20250410
>
> Generally, I would not expect 'randconfig' kernels to pass all tests,
> and what happened here is that removing the CONFIG_MK8 option made it
> pick some completely different CPU
>
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> The most relevant options here are
>
> -# CONFIG_M486SX is not set
> +CONFIG_M486SX=y
> # CONFIG_SMP is not set
> CONFIG_X86_GENERIC=y
>
> In theory, setting X86_GENERIC should make a kernel built for an
> older CPU work on any newer one. In practice, I'm not surprised
> that this fails: While AMD K8 is ten years older than Intel Sandy
> Bridge, they are architecturally still very similar. The i486SX
> is another decade older, but its design is as far removed from
> both K8 and Sandy Bridge as it gets.
>
> It would be nice to not have to support 486sx any more.
> We have discussed removing support for older CPUs without
> TSC, FPU and CX8 in the past, but so far always kept them
> around.
>
> > [ 721.016779][ C0] hardirqs last disabled at (159506):
> > sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049)
> > [ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs
> > (kernel/softirq.c:408 kernel/softirq.c:589)
> > [ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq
> > (kernel/softirq.c:596)
> > [ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted
> > 6.14.0-rc3-00037-gf388f60ca904 #1
> > [ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 721.016779][ C0] EIP: timekeeping_notify
> > (kernel/time/timekeeping.c:1522)
>
> Timekeeping code could be related, I see that CONFIG_X86_TSC
> is disabled for i486SX configurations, so even if a TSC is present
> in the emulated machine, it is not being used to measure time
> accurately.
>
> > -CONFIG_X86_CMPXCHG64=y
>
> This could be another issue, if there is code that relies on
> the cx8/cmpxchg8b feature to be used. Since this is a non-SMP
> kernel, this is less likely to be the cause of the problem.
thanks a lot for all these details!
>
> Can you try what happens when you enable the two options, either
> by changing CONFIG_M486SX to CONFIG_M586TSC, or with a patch
> like the one below? Note that CONFIG_X86_CMPXCHG64 recently
> got renamed to CONFIG_X86_CX8, but they are the exact same thing.
I applied your patch directly upon f388f60ca9 (change for X86_CMPXCHG64
instead of X86_CX8 as you metnioned), commit id is
c1f7ef63239411313163a7b1bff654236f48351c
after building, the config has below diff to f388f60ca9
--- f388f60ca9041a95c9b3f157d316ed7c8f297e44/.config 2025-04-15 15:41:17.009901645 +0800
+++ c1f7ef63239411313163a7b1bff654236f48351c/.config 2025-04-23 09:36:43.718421931 +0800
@@ -351,7 +351,9 @@ CONFIG_X86_F00F_BUG=y
CONFIG_X86_INVD_BUG=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=4
+CONFIG_X86_TSC=y
+CONFIG_X86_CMPXCHG64=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
@@ -5745,6 +5747,7 @@ CONFIG_GENERIC_NET_UTILS=y
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_IOMAP=y
+CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
by running same tests, now it backs to the clean status like
fc2d5cbe541032e7 (parent of f388f60ca9)
(the statistics data for fc2d5cbe541032e7 and f388f60ca9 has some difference to
the data we shared last time due to some auto cleanup logic in our service which
removes some results which are suspiciously caused by our env problem)
fc2d5cbe541032e7 f388f60ca9041a95c9b3f157d31 c1f7ef63239411313163a7b1bff
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:496 29% 145:494 0% :500 last_state.booting
:496 7% 35:494 0% :500 dmesg.BUG:kernel_hang_in_boot_stage
:496 9% 45:494 0% :500 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
:496 0% 1:494 0% :500 dmesg.EIP:__timer_delete_sync
:496 1% 5:494 0% :500 dmesg.EIP:_raw_spin_unlock_irq
:496 0% 2:494 0% :500 dmesg.EIP:_raw_spin_unlock_irqrestore
:496 0% 1:494 0% :500 dmesg.EIP:console_emit_next_record
:496 0% 1:494 0% :500 dmesg.EIP:handle_softirqs
:496 1% 3:494 0% :500 dmesg.EIP:lock_acquire
:496 0% 2:494 0% :500 dmesg.EIP:lock_release
:496 0% 1:494 0% :500 dmesg.EIP:queue_delayed_work_on
:496 9% 45:494 0% :500 dmesg.EIP:timekeeping_notify
:496 3% 14:494 0% :500 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
:496 6% 32:494 0% :500 dmesg.INFO:task_blocked_for_more_than#seconds
:496 9% 45:494 0% :500 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
> index f928cf6e3252..ac6cc69060f1 100644
> --- a/arch/x86/Kconfig.cpu
> +++ b/arch/x86/Kconfig.cpu
> @@ -317,7 +317,6 @@ config X86_USE_PPRO_CHECKSUM
>
> config X86_TSC
> def_bool y
> - depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MATOM) || X86_64
>
> config X86_HAVE_PAE
> def_bool y
> @@ -325,7 +324,6 @@ config X86_HAVE_PAE
>
> config X86_CX8
> def_bool y
> - depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
>
> # this should be set for all -march=.. options where the compiler
> # generates cmov.
>
> Arnd
On Thu, Apr 24, 2025, at 04:12, Oliver Sang wrote:
> On Tue, Apr 22, 2025 at 12:16:33PM +0200, Arnd Bergmann wrote:
Cc: x86 and timekeeping maintainers, see
https://lore.kernel.org/lkml/202504211553.3ba9400-lkp@intel.com/
for the thread so far.
>> > [ 721.016779][ C0] hardirqs last disabled at (159506):
>> > sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049)
>> > [ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs
>> > (kernel/softirq.c:408 kernel/softirq.c:589)
>> > [ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq
>> > (kernel/softirq.c:596)
>> > [ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted
>> > 6.14.0-rc3-00037-gf388f60ca904 #1
>> > [ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> > [ 721.016779][ C0] EIP: timekeeping_notify
>> > (kernel/time/timekeeping.c:1522)
>>
>> Timekeeping code could be related, I see that CONFIG_X86_TSC
>> is disabled for i486SX configurations, so even if a TSC is present
>> in the emulated machine, it is not being used to measure time
>> accurately.
>>
>> > -CONFIG_X86_CMPXCHG64=y
>>
>> This could be another issue, if there is code that relies on
>> the cx8/cmpxchg8b feature to be used. Since this is a non-SMP
>> kernel, this is less likely to be the cause of the problem.
>
> thanks a lot for all these details!
>
>>
>> Can you try what happens when you enable the two options, either
>> by changing CONFIG_M486SX to CONFIG_M586TSC, or with a patch
>> like the one below? Note that CONFIG_X86_CMPXCHG64 recently
>> got renamed to CONFIG_X86_CX8, but they are the exact same thing.
>
> I applied your patch directly upon f388f60ca9 (change for X86_CMPXCHG64
> instead of X86_CX8 as you metnioned), commit id is
> c1f7ef63239411313163a7b1bff654236f48351c
>
...
> by running same tests, now it backs to the clean status like
> fc2d5cbe541032e7 (parent of f388f60ca9)
Thanks for confirming. So a 486-targeted kernel still passes
your tests on modern hardware if we force TSC and CX8 to
be enabled, but the boot fails if the options are turned
off in Kconfig (though available in emulated hardware).
To be completely sure, you could re-run the same test with
just one of these enabled, but I'm rather sure that the TSC
is the root cause. I tried reproducing the problem locally
with your .config on a qemu/tcg emulation running on an
arm64 host, but this seems to run fine, including the
rcutorture tests.
Comparing my results with your log file, I see that your
crash happens while changing the clocksource:
Your dmesg:
[ 92.548514][ T1] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 721.016745][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper:1]
My dmesg:
[ 1.154511][ T1] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[ 1.157896][ T1] clocksource: Switched to clocksource tsc-early
There are also clearly some differences between TCG and KVM in
the handling of TSC, e.g. I get this warning from qemu itself
for the SandyBridge CPU:
qemu-system-i386: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
I tried a few other variations, including KVM on an x86 laptop
(using kvmclock or tsc-early clocksource), but none of them failed
the way yours did.
Arnd
hi, Arnd,
On Thu, Apr 24, 2025 at 09:59:38AM +0200, Arnd Bergmann wrote:
> On Thu, Apr 24, 2025, at 04:12, Oliver Sang wrote:
> > On Tue, Apr 22, 2025 at 12:16:33PM +0200, Arnd Bergmann wrote:
>
> Cc: x86 and timekeeping maintainers, see
> https://lore.kernel.org/lkml/202504211553.3ba9400-lkp@intel.com/
> for the thread so far.
>
> >> > [ 721.016779][ C0] hardirqs last disabled at (159506):
> >> > sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049)
> >> > [ 721.016779][ C0] softirqs last enabled at (159174): handle_softirqs
> >> > (kernel/softirq.c:408 kernel/softirq.c:589)
> >> > [ 721.016779][ C0] softirqs last disabled at (159159): __do_softirq
> >> > (kernel/softirq.c:596)
> >> > [ 721.016779][ C0] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted
> >> > 6.14.0-rc3-00037-gf388f60ca904 #1
> >> > [ 721.016779][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
> >> > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> > [ 721.016779][ C0] EIP: timekeeping_notify
> >> > (kernel/time/timekeeping.c:1522)
> >>
> >> Timekeeping code could be related, I see that CONFIG_X86_TSC
> >> is disabled for i486SX configurations, so even if a TSC is present
> >> in the emulated machine, it is not being used to measure time
> >> accurately.
> >>
> >> > -CONFIG_X86_CMPXCHG64=y
> >>
> >> This could be another issue, if there is code that relies on
> >> the cx8/cmpxchg8b feature to be used. Since this is a non-SMP
> >> kernel, this is less likely to be the cause of the problem.
> >
> > thanks a lot for all these details!
> >
> >>
> >> Can you try what happens when you enable the two options, either
> >> by changing CONFIG_M486SX to CONFIG_M586TSC, or with a patch
> >> like the one below? Note that CONFIG_X86_CMPXCHG64 recently
> >> got renamed to CONFIG_X86_CX8, but they are the exact same thing.
> >
> > I applied your patch directly upon f388f60ca9 (change for X86_CMPXCHG64
> > instead of X86_CX8 as you metnioned), commit id is
> > c1f7ef63239411313163a7b1bff654236f48351c
> >
> ...
> > by running same tests, now it backs to the clean status like
> > fc2d5cbe541032e7 (parent of f388f60ca9)
>
> Thanks for confirming. So a 486-targeted kernel still passes
> your tests on modern hardware if we force TSC and CX8 to
> be enabled, but the boot fails if the options are turned
> off in Kconfig (though available in emulated hardware).
>
> To be completely sure, you could re-run the same test with
> just one of these enabled, but I'm rather sure that the TSC
> is the root cause.
just FYI. we rerun the tests. if only enable X86_TSC, the config diff is
--- /pkg/linux/i386-randconfig-r071-20250410/gcc-12/f388f60ca9041a95c9b3f157d316ed7c8f297e44/.config 2025-04-15 15:41:17.009901645 +0800
+++ /pkg/linux/i386-randconfig-r071-20250410/gcc-12/801597ddaae3bdc15546df3d0eba6e9e4e157b8d/.config 2025-04-25 14:24:49.488257697 +0800
@@ -351,6 +351,7 @@ CONFIG_X86_F00F_BUG=y
CONFIG_X86_INVD_BUG=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
+CONFIG_X86_TSC=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
the various issues still exists:
fc2d5cbe541032e7 f388f60ca9041a95c9b3f157d31 801597ddaae3bdc15546df3d0eb
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:496 29% 145:494 31% 153:500 last_state.booting
:496 7% 35:494 14% 71:500 dmesg.BUG:kernel_hang_in_boot_stage
:496 0% :494 0% 1:500 dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker##:#]
:496 9% 45:494 5% 25:500 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
:496 0% 1:494 0% :500 dmesg.EIP:__timer_delete_sync
:496 1% 5:494 1% 3:500 dmesg.EIP:_raw_spin_unlock_irq
:496 0% 2:494 0% 1:500 dmesg.EIP:_raw_spin_unlock_irqrestore
:496 0% 1:494 0% :500 dmesg.EIP:console_emit_next_record
:496 0% :494 0% 1:500 dmesg.EIP:finish_task_switch
:496 0% 1:494 1% 4:500 dmesg.EIP:handle_softirqs
:496 1% 3:494 1% 4:500 dmesg.EIP:lock_acquire
:496 0% 2:494 0% 2:500 dmesg.EIP:lock_release
:496 0% :494 0% 1:500 dmesg.EIP:preempt_schedule_thunk
:496 0% 1:494 0% :500 dmesg.EIP:queue_delayed_work_on
:496 0% :494 0% 1:500 dmesg.EIP:rmqueue_pcplist
:496 9% 45:494 5% 25:500 dmesg.EIP:timekeeping_notify
:496 3% 14:494 2% 8:500 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
:496 0% :494 0% 1:500 dmesg.INFO:rcu_preempt_self-detected_stall_on_CPU
:496 6% 32:494 9% 46:500 dmesg.INFO:task_blocked_for_more_than#seconds
:496 9% 45:494 5% 26:500 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
if only enable X86_CMPXCHG64:
--- /pkg/linux/i386-randconfig-r071-20250410/gcc-12/f388f60ca9041a95c9b3f157d316ed7c8f297e44/.config 2025-04-15 15:41:17.009901645 +0800
+++ /pkg/linux/i386-randconfig-r071-20250410/gcc-12/dce28f73d4220df77c7fd6c41e854f1ef0af1d02/.config 2025-04-25 14:30:26.838037407 +0800
@@ -351,7 +351,8 @@ CONFIG_X86_F00F_BUG=y
CONFIG_X86_INVD_BUG=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_INTEL_USERCOPY=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=4
+CONFIG_X86_CMPXCHG64=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_IA32_FEAT_CTL=y
CONFIG_X86_VMX_FEATURE_NAMES=y
CONFIG_CPU_SUP_INTEL=y
@@ -5745,6 +5746,7 @@ CONFIG_GENERIC_NET_UTILS=y
# CONFIG_PRIME_NUMBERS is not set
CONFIG_RATIONAL=y
CONFIG_GENERIC_IOMAP=y
+CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y
CONFIG_ARCH_HAS_FAST_MULTIPLIER=y
CONFIG_ARCH_USE_SYM_ANNOTATIONS=y
various issues gone:
fc2d5cbe541032e7 f388f60ca9041a95c9b3f157d31 dce28f73d4220df77c7fd6c41e8
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:496 29% 145:494 0% :500 last_state.booting
:496 7% 35:494 0% :500 dmesg.BUG:kernel_hang_in_boot_stage
:496 9% 45:494 0% :500 dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#]
:496 0% 1:494 0% :500 dmesg.EIP:__timer_delete_sync
:496 1% 5:494 0% :500 dmesg.EIP:_raw_spin_unlock_irq
:496 0% 2:494 0% :500 dmesg.EIP:_raw_spin_unlock_irqrestore
:496 0% 1:494 0% :500 dmesg.EIP:console_emit_next_record
:496 0% 1:494 0% :500 dmesg.EIP:handle_softirqs
:496 1% 3:494 0% :500 dmesg.EIP:lock_acquire
:496 0% 2:494 0% :500 dmesg.EIP:lock_release
:496 0% 1:494 0% :500 dmesg.EIP:queue_delayed_work_on
:496 9% 45:494 0% :500 dmesg.EIP:timekeeping_notify
:496 3% 14:494 0% :500 dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
:496 6% 32:494 0% :500 dmesg.INFO:task_blocked_for_more_than#seconds
:496 9% 45:494 0% :500 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
> I tried reproducing the problem locally
> with your .config on a qemu/tcg emulation running on an
> arm64 host, but this seems to run fine, including the
> rcutorture tests.
>
> Comparing my results with your log file, I see that your
> crash happens while changing the clocksource:
>
> Your dmesg:
> [ 92.548514][ T1] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> [ 721.016745][ C0] watchdog: BUG: soft lockup - CPU#0 stuck for 626s! [swapper:1]
>
> My dmesg:
> [ 1.154511][ T1] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> [ 1.157896][ T1] clocksource: Switched to clocksource tsc-early
>
> There are also clearly some differences between TCG and KVM in
> the handling of TSC, e.g. I get this warning from qemu itself
> for the SandyBridge CPU:
>
> qemu-system-i386: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
>
> I tried a few other variations, including KVM on an x86 laptop
> (using kvmclock or tsc-early clocksource), but none of them failed
> the way yours did.
>
> Arnd
>
On Sat, 26 Apr 2025 at 22:49, Oliver Sang <oliver.sang@intel.com> wrote:
>
> We reran the tests. if only enable X86_TSC, the various issues still
> exists. if only enable X86_CMPXCHG64, various issues gone.
Well, that's unexpected. I really didn't expect X86_CMPXCHG64 to make
any difference, since we should still use the cmpxchg64 instruction,
just with the alternative re-writing instead of directly.
Thanks for re-running the tests.
All the non-cmpxchg64 code sequences get replaced by the cmpxchg64
ones dynamically, so it all shouldn't matter one whit.
Except for during early boot. Because we do default to the old i386
sequences all the way *until* we do the alternatives replacement with
the good cmpxchg64 ones.
It does change code generation, in that we have to have that
alternative which now can be a call, so it's not a complete no-op, but
I'm still surprised.
And except for not using CMPXCHG_LOCKREF at all, but that should be
just a performance thing, and not noticeable during boot.
Hmm...
I'd love to understand why X86_CMPXCHG64 apparently matters, but I
can't convince myself that it's worth really pursuing.
Linus
On April 27, 2025 9:39:55 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote: >On Sat, 26 Apr 2025 at 22:49, Oliver Sang <oliver.sang@intel.com> wrote: >> >> We reran the tests. if only enable X86_TSC, the various issues still >> exists. if only enable X86_CMPXCHG64, various issues gone. > >Well, that's unexpected. I really didn't expect X86_CMPXCHG64 to make >any difference, since we should still use the cmpxchg64 instruction, >just with the alternative re-writing instead of directly. > >Thanks for re-running the tests. > >All the non-cmpxchg64 code sequences get replaced by the cmpxchg64 >ones dynamically, so it all shouldn't matter one whit. > >Except for during early boot. Because we do default to the old i386 >sequences all the way *until* we do the alternatives replacement with >the good cmpxchg64 ones. > >It does change code generation, in that we have to have that >alternative which now can be a call, so it's not a complete no-op, but >I'm still surprised. > >And except for not using CMPXCHG_LOCKREF at all, but that should be >just a performance thing, and not noticeable during boot. > >Hmm... > >I'd love to understand why X86_CMPXCHG64 apparently matters, but I >can't convince myself that it's worth really pursuing. > > Linus Sounds like the fallback stubs are subtly broken. Personally I would have thought cmpxchg64 to be the biggest win in raising the baseline to the i586 ISA.
On Sun, Apr 27, 2025, at 07:48, Oliver Sang wrote:
> On Thu, Apr 24, 2025 at 09:59:38AM +0200, Arnd Bergmann wrote:
>>
>> Thanks for confirming. So a 486-targeted kernel still passes
>> your tests on modern hardware if we force TSC and CX8 to
>> be enabled, but the boot fails if the options are turned
>> off in Kconfig (though available in emulated hardware).
>>
>> To be completely sure, you could re-run the same test with
>> just one of these enabled, but I'm rather sure that the TSC
>> is the root cause.
>
> just FYI. we rerun the tests. if only enable X86_TSC, the config diff is
...
> the various issues still exists:
>
> if only enable X86_CMPXCHG64:
...
> various issues gone:
Interesting, so cx8 was indeed a problem. I would still assume
that TSC caused the boot panic I cited, but it looks like CX8
caused all the other symptoms.
At the minimum this strengthens the case for the consensus of
dropping support for all pre-586 cores.
Thanks a lot for confirming!
Arnd
On Thu, 24 Apr 2025 at 01:01, Arnd Bergmann <arnd@arndb.de> wrote:
>
> Thanks for confirming. So a 486-targeted kernel still passes
> your tests on modern hardware if we force TSC and CX8 to
> be enabled, but the boot fails if the options are turned
> off in Kconfig (though available in emulated hardware).
I wouldn't expect CX8 to really matter - it causes us to generate
extra code to pick one over the other, but on modern hardware we'll
still always then dynamically pick the cmpxchg8b instruction.
Could it trigger bugs in our alternatives, or some miscompilation due
to the extra complexity? Sure. But it does sound unlikely.
> To be completely sure, you could re-run the same test with
> just one of these enabled, but I'm rather sure that the TSC
> is the root cause.
Agreed.
Particularly when the lockup is then in timekeeping_notify() during
the initial initcalls -> clocksource_select(), I'm pretty sure this is
purely about TSC.
That said, maybe the problem is in the watchdog logic, because
clocksource_done_booting() is what starts the watchdog thread .
So it might be the watchdog code itself that then gets confused
(because of some "don't use tsc" case that never gets any testing in
real life) and triggers immediately - and then points the finger at
the clocksource code only because that's what is still running.
Because CONFIG_X86_TSC does cause some oddities: we end up still
*using* the TSC for many things if the hardware supports it (which
modern hardware obviously does), but then other things get disabled
entirely.
For example, this:
/*
* Boot-time check whether the TSCs are synchronized across
* all CPUs/cores:
*/
#ifdef CONFIG_X86_TSC
extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
extern void tsc_verify_tsc_adjust(bool resume);
extern void check_tsc_sync_target(void);
#else
static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) {
return false; }
static inline void tsc_verify_tsc_adjust(bool resume) { }
static inline void check_tsc_sync_target(void) { }
#endif
So that tsc_store_and_check_tsc_adjust() thing etc never gets run,
even though we actually *do* use TSC for get_cycles() and friends,
because *that* code checks the runtime status too:
Now, none of that should matter - because all *those* things are about
details that simply aren't relevant for any of this case - but maybe
there is some other situation that has similar "I'm actually using the
TSC through get_cycles(), but I didn't do some setup because X86_TSC
wasn't on.."
I really get the feeling that it's time to leave i486 support behind.
There's zero real reason for anybody to waste one second of
development effort on this kind of issue.
Linus
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> I really get the feeling that it's time to leave i486 support behind.
> There's zero real reason for anybody to waste one second of
> development effort on this kind of issue.
Fully agreed!
And to turn this idea into code, here's a very raw RFC series that
starts removing non-TSC 586 and 486 code and related support code from
the x86 architecture, with the goal to make TSC and CX8 (CMPXCHG8B)
support unconditionally available:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/cpu
The full diffstat is nice, primarily due to the removal of the math-emu
library:
83 files changed, 30 insertions(+), 14683 deletions(-)
But even without the math-emu/ removal and the drivers/ pruning it's a
substantial simplification:
20 files changed, 29 insertions(+), 629 deletions(-)
The patches most relevant to this discussion should be:
x86/cpu: Remove M486/M486SX/ELAN support
...
x86/cpu: Remove TSC-less CONFIG_M586 support
x86/cpu: Make CONFIG_X86_TSC unconditional
x86/cpu: Make CONFIG_X86_CX8 unconditional
x86/percpu: Remove !CONFIG_X86_CX8 methods
x86/atomics: Remove !CONFIG_X86_CX8 methods
If there's no big objections about the scope of removal I'll finish it
by removing the non-TSC complications as well, and send out the series
to lkml for further review.
( Note that some of the patches in there are still WIP, as the branch
name suggests. )
Thanks,
Ingo
================>
Ingo Molnar (17):
x86/cpu: Remove M486/M486SX/ELAN support
x86/cpu: Remove the CONFIG_X86_INVD_BUG quirk
x86/cpu, cpufreq: Remove AMD ELAN support
x86/fpu: Remove MATH_EMULATION and related glue code
x86/fpu: Remove the 'no387' boot option
x86/fpu: Remove the math-emu/ FPU emulation library
x86/platform: Remove CONFIG_X86_RDC321X support
arch/x86, gpio: Remove GPIO_RDC321X support
arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
arch/x86, mfd: Remove MFD_RDC321X support
x86/reboot: Remove the RDC321X reboot quirk
x86/cpu: Remove CPU_SUP_UMC_32 support
x86/cpu: Remove TSC-less CONFIG_M586 support
x86/cpu: Make CONFIG_X86_TSC unconditional
x86/cpu: Make CONFIG_X86_CX8 unconditional
x86/percpu: Remove !X86_CX8 methods
x86/atomics: Remove !CONFIG_X86_CX8 methods
Documentation/admin-guide/kernel-parameters.txt | 4 -
arch/x86/Kconfig | 71 +-
arch/x86/Kconfig.cpu | 73 +-
arch/x86/Kconfig.cpufeatures | 2 -
arch/x86/Makefile | 1 -
arch/x86/Makefile_32.cpu | 6 -
arch/x86/include/asm/asm-prototypes.h | 4 -
arch/x86/include/asm/atomic64_32.h | 17 +-
arch/x86/include/asm/cmpxchg_32.h | 86 +-
arch/x86/include/asm/fpu/api.h | 6 -
arch/x86/include/asm/percpu.h | 6 +-
arch/x86/include/asm/vermagic.h | 8 -
arch/x86/kernel/cpu/common.c | 7 -
arch/x86/kernel/cpu/umc.c | 26 -
arch/x86/kernel/fpu/core.c | 5 -
arch/x86/kernel/fpu/init.c | 9 +-
arch/x86/kernel/reboot_fixups_32.c | 14 -
arch/x86/kernel/traps.c | 21 -
arch/x86/lib/Makefile | 4 -
arch/x86/lib/atomic64_386_32.S | 195 ---
arch/x86/lib/cmpxchg8b_emu.S | 97 --
arch/x86/math-emu/Makefile | 30 -
arch/x86/math-emu/README | 427 ------
arch/x86/math-emu/control_w.h | 46 -
arch/x86/math-emu/div_Xsig.S | 367 -----
arch/x86/math-emu/div_small.S | 48 -
arch/x86/math-emu/errors.c | 686 ----------
arch/x86/math-emu/exception.h | 51 -
arch/x86/math-emu/fpu_arith.c | 153 ---
arch/x86/math-emu/fpu_asm.h | 32 -
arch/x86/math-emu/fpu_aux.c | 267 ----
arch/x86/math-emu/fpu_emu.h | 218 ---
arch/x86/math-emu/fpu_entry.c | 718 ----------
arch/x86/math-emu/fpu_etc.c | 136 --
arch/x86/math-emu/fpu_proto.h | 157 ---
arch/x86/math-emu/fpu_system.h | 130 --
arch/x86/math-emu/fpu_tags.c | 116 --
arch/x86/math-emu/fpu_trig.c | 1649 -----------------------
arch/x86/math-emu/get_address.c | 401 ------
arch/x86/math-emu/load_store.c | 322 -----
arch/x86/math-emu/mul_Xsig.S | 179 ---
arch/x86/math-emu/poly.h | 115 --
arch/x86/math-emu/poly_2xm1.c | 146 --
arch/x86/math-emu/poly_atan.c | 209 ---
arch/x86/math-emu/poly_l2.c | 245 ----
arch/x86/math-emu/poly_sin.c | 379 ------
arch/x86/math-emu/poly_tan.c | 213 ---
arch/x86/math-emu/polynom_Xsig.S | 137 --
arch/x86/math-emu/reg_add_sub.c | 334 -----
arch/x86/math-emu/reg_compare.c | 479 -------
arch/x86/math-emu/reg_constant.c | 123 --
arch/x86/math-emu/reg_constant.h | 26 -
arch/x86/math-emu/reg_convert.c | 47 -
arch/x86/math-emu/reg_divide.c | 183 ---
arch/x86/math-emu/reg_ld_str.c | 1220 -----------------
arch/x86/math-emu/reg_mul.c | 116 --
arch/x86/math-emu/reg_norm.S | 150 ---
arch/x86/math-emu/reg_round.S | 711 ----------
arch/x86/math-emu/reg_u_add.S | 169 ---
arch/x86/math-emu/reg_u_div.S | 474 -------
arch/x86/math-emu/reg_u_mul.S | 150 ---
arch/x86/math-emu/reg_u_sub.S | 274 ----
arch/x86/math-emu/round_Xsig.S | 142 --
arch/x86/math-emu/shr_Xsig.S | 89 --
arch/x86/math-emu/status_w.h | 68 -
arch/x86/math-emu/version.h | 12 -
arch/x86/math-emu/wm_shrx.S | 207 ---
arch/x86/math-emu/wm_sqrt.S | 472 -------
drivers/cpufreq/Kconfig.x86 | 26 -
drivers/cpufreq/Makefile | 2 -
drivers/cpufreq/elanfreq.c | 227 ----
drivers/cpufreq/sc520_freq.c | 137 --
drivers/gpio/Kconfig | 8 -
drivers/gpio/Makefile | 1 -
drivers/gpio/gpio-rdc321x.c | 197 ---
drivers/mfd/Kconfig | 9 -
drivers/mfd/Makefile | 1 -
drivers/mfd/rdc321x-southbridge.c | 96 --
drivers/watchdog/Kconfig | 11 -
drivers/watchdog/Makefile | 1 -
drivers/watchdog/rdc321x_wdt.c | 281 ----
include/linux/mfd/rdc321x.h | 27 -
lib/atomic64_test.c | 4 +-
83 files changed, 30 insertions(+), 14683 deletions(-)
delete mode 100644 arch/x86/kernel/cpu/umc.c
delete mode 100644 arch/x86/lib/atomic64_386_32.S
delete mode 100644 arch/x86/lib/cmpxchg8b_emu.S
delete mode 100644 arch/x86/math-emu/Makefile
delete mode 100644 arch/x86/math-emu/README
delete mode 100644 arch/x86/math-emu/control_w.h
delete mode 100644 arch/x86/math-emu/div_Xsig.S
delete mode 100644 arch/x86/math-emu/div_small.S
delete mode 100644 arch/x86/math-emu/errors.c
delete mode 100644 arch/x86/math-emu/exception.h
delete mode 100644 arch/x86/math-emu/fpu_arith.c
delete mode 100644 arch/x86/math-emu/fpu_asm.h
delete mode 100644 arch/x86/math-emu/fpu_aux.c
delete mode 100644 arch/x86/math-emu/fpu_emu.h
delete mode 100644 arch/x86/math-emu/fpu_entry.c
delete mode 100644 arch/x86/math-emu/fpu_etc.c
delete mode 100644 arch/x86/math-emu/fpu_proto.h
delete mode 100644 arch/x86/math-emu/fpu_system.h
delete mode 100644 arch/x86/math-emu/fpu_tags.c
delete mode 100644 arch/x86/math-emu/fpu_trig.c
delete mode 100644 arch/x86/math-emu/get_address.c
delete mode 100644 arch/x86/math-emu/load_store.c
delete mode 100644 arch/x86/math-emu/mul_Xsig.S
delete mode 100644 arch/x86/math-emu/poly.h
delete mode 100644 arch/x86/math-emu/poly_2xm1.c
delete mode 100644 arch/x86/math-emu/poly_atan.c
delete mode 100644 arch/x86/math-emu/poly_l2.c
delete mode 100644 arch/x86/math-emu/poly_sin.c
delete mode 100644 arch/x86/math-emu/poly_tan.c
delete mode 100644 arch/x86/math-emu/polynom_Xsig.S
delete mode 100644 arch/x86/math-emu/reg_add_sub.c
delete mode 100644 arch/x86/math-emu/reg_compare.c
delete mode 100644 arch/x86/math-emu/reg_constant.c
delete mode 100644 arch/x86/math-emu/reg_constant.h
delete mode 100644 arch/x86/math-emu/reg_convert.c
delete mode 100644 arch/x86/math-emu/reg_divide.c
delete mode 100644 arch/x86/math-emu/reg_ld_str.c
delete mode 100644 arch/x86/math-emu/reg_mul.c
delete mode 100644 arch/x86/math-emu/reg_norm.S
delete mode 100644 arch/x86/math-emu/reg_round.S
delete mode 100644 arch/x86/math-emu/reg_u_add.S
delete mode 100644 arch/x86/math-emu/reg_u_div.S
delete mode 100644 arch/x86/math-emu/reg_u_mul.S
delete mode 100644 arch/x86/math-emu/reg_u_sub.S
delete mode 100644 arch/x86/math-emu/round_Xsig.S
delete mode 100644 arch/x86/math-emu/shr_Xsig.S
delete mode 100644 arch/x86/math-emu/status_w.h
delete mode 100644 arch/x86/math-emu/version.h
delete mode 100644 arch/x86/math-emu/wm_shrx.S
delete mode 100644 arch/x86/math-emu/wm_sqrt.S
delete mode 100644 drivers/cpufreq/elanfreq.c
delete mode 100644 drivers/cpufreq/sc520_freq.c
delete mode 100644 drivers/gpio/gpio-rdc321x.c
delete mode 100644 drivers/mfd/rdc321x-southbridge.c
delete mode 100644 drivers/watchdog/rdc321x_wdt.c
delete mode 100644 include/linux/mfd/rdc321x.h
On Thu, Apr 24, 2025, at 19:54, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> I really get the feeling that it's time to leave i486 support behind.
>> There's zero real reason for anybody to waste one second of
>> development effort on this kind of issue.
>
> Fully agreed!
>
> And to turn this idea into code, here's a very raw RFC series that
> starts removing non-TSC 586 and 486 code and related support code from
> the x86 architecture, with the goal to make TSC and CX8 (CMPXCHG8B)
> support unconditionally available:
Yes, makes sense. I had considered doing something like this in
my cleanup for large machines, but decided to keep stop there
because I know that there are users that love their museum pieces.
For embedded systems, I'm quite sure that the AMD Élan, SIS 55x,
RDC321x and Vortex86SX have ended their useful life. You may
still be able to buy Vortex86SX machines and they are probably
still running somewhere, but the entire point of those machines
is to run old software. DM&P provides patches for linux-2.6.29
and Windows CE 6.0 for the SX chips.
The desktop Socket 7 clones without CX8/TSC all got discontinued
over 25 years ago, and they were rare even then.
> x86/platform: Remove CONFIG_X86_RDC321X support
> arch/x86, gpio: Remove GPIO_RDC321X support
> arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
> arch/x86, mfd: Remove MFD_RDC321X support
> x86/reboot: Remove the RDC321X reboot quirk
I'm not sure about the RDC321X bits. Obviously the original
321x/861x/vortex86sx chips are obsolete and can be removed,
but the product line is still actively developed by RDC and
DM&P, and I suspect that some of the drivers are still used
on 586tsc-class (vortex86dx, vortex86mx) and 686-class
(vortex86dx3, vortex86ex) SoCs that do run modern kernels and
get updates.
> x86/cpu: Remove CPU_SUP_UMC_32 support
> x86/cpu: Remove TSC-less CONFIG_M586 support
I think Winchip6 (486-class, no tsc, no cx8) and Winchip3D
(486-class, with tsc but no cx8) need to go as well then.
At this point, maybe we can consider removing
CONFIG_X86_GENERIC and just always build kernels that work
across a wide set of CPUs: Only CMOV and PAE still require a
CPU with the hardware support, and X86_L1_CACHE_SHIFT needs to
be at least 6 (64 byte) for compatibility, but everything
else should just be a tuning option.
Arnd
* Arnd Bergmann <arnd@arndb.de> wrote:
> > x86/platform: Remove CONFIG_X86_RDC321X support
> > arch/x86, gpio: Remove GPIO_RDC321X support
> > arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
> > arch/x86, mfd: Remove MFD_RDC321X support
> > x86/reboot: Remove the RDC321X reboot quirk
>
> I'm not sure about the RDC321X bits. Obviously the original
> 321x/861x/vortex86sx chips are obsolete and can be removed,
> but the product line is still actively developed by RDC and
> DM&P, and I suspect that some of the drivers are still used
> on 586tsc-class (vortex86dx, vortex86mx) and 686-class
> (vortex86dx3, vortex86ex) SoCs that do run modern kernels and
> get updates.
So CONFIG_X86_RDC321X actively selects M486:
+++ b/arch/x86/Kconfig
config X86_RDC321X
bool "RDC R-321x SoC"
depends on X86_32
depends on X86_EXTENDED_PLATFORM
select M486
^^^^^^^^^^^
select X86_REBOOTFIXUPS
But indeed the other drivers are not dependent on M486, at least
overtly:
arch/x86, mfd: Remove MFD_RDC321X support
arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
arch/x86, gpio: Remove GPIO_RDC321X support
Although the watchdog driver has this indirect dependency:
drivers/watchdog/Kconfig: depends on X86_RDC321X || COMPILE_TEST
But the 486 kernel would work on any 586/686 upgraded boards as well.
Anyway, I've dropped the mfd/watchdog/gpio removal patches, no harm in
keeping these drivers, and I've switched the watchdog driver over to
X86_32:
config RDC321X_WDT
tristate "RDC R-321x SoC watchdog"
depends on X86_32 || COMPILE_TEST
There's also no harm in keeping the southbridge reboot quirk I suppose,
so I've dropped this as well:
x86/reboot: Remove the RDC321X reboot quirk
> > x86/cpu: Remove CPU_SUP_UMC_32 support
> > x86/cpu: Remove TSC-less CONFIG_M586 support
>
> I think Winchip6 (486-class, no tsc, no cx8) and Winchip3D
> (486-class, with tsc but no cx8) need to go as well then.
Okay, agreed, I've added this patch to the tree:
bf82539ad9f6 x86/cpu: Remove CONFIG_MWINCHIP3D/MWINCHIPC6
arch/x86/Kconfig.cpu | 28 ++++------------------------
arch/x86/Makefile_32.cpu | 2 --
arch/x86/include/asm/vermagic.h | 4 ----
3 files changed, 4 insertions(+), 30 deletions(-)
> At this point, maybe we can consider removing CONFIG_X86_GENERIC and
> just always build kernels that work across a wide set of CPUs: Only
> CMOV and PAE still require a CPU with the hardware support, and
> X86_L1_CACHE_SHIFT needs to be at least 6 (64 byte) for
> compatibility, but everything else should just be a tuning option.
Agreed.
Thanks,
Ingo
On Fri, Apr 25, 2025, at 09:40, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@arndb.de> wrote:
>
>> > x86/platform: Remove CONFIG_X86_RDC321X support
>> > arch/x86, gpio: Remove GPIO_RDC321X support
>> > arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
>> > arch/x86, mfd: Remove MFD_RDC321X support
>> > x86/reboot: Remove the RDC321X reboot quirk
>>
>> I'm not sure about the RDC321X bits. Obviously the original
>> 321x/861x/vortex86sx chips are obsolete and can be removed,
>> but the product line is still actively developed by RDC and
>> DM&P, and I suspect that some of the drivers are still used
>> on 586tsc-class (vortex86dx, vortex86mx) and 686-class
>> (vortex86dx3, vortex86ex) SoCs that do run modern kernels and
>> get updates.
>
> So CONFIG_X86_RDC321X actively selects M486:
>
> +++ b/arch/x86/Kconfig
>
> config X86_RDC321X
> bool "RDC R-321x SoC"
> depends on X86_32
> depends on X86_EXTENDED_PLATFORM
> select M486
> ^^^^^^^^^^^
> select X86_REBOOTFIXUPS
Right, when the code got added, it was certainly for that
specific chip, which I think is 486SX compatible.
The 'select M486' here doesn't actually do anything because
Kconfig silently ignores 'select' for 'choice' symbols.
> But indeed the other drivers are not dependent on M486, at least
> overtly:
>
> arch/x86, mfd: Remove MFD_RDC321X support
> arch/x86, watchdog: Remove the RDC321X_WDT watchdog driver
> arch/x86, gpio: Remove GPIO_RDC321X support
>
> Although the watchdog driver has this indirect dependency:
>
> drivers/watchdog/Kconfig: depends on X86_RDC321X || COMPILE_TEST
> Anyway, I've dropped the mfd/watchdog/gpio removal patches, no harm in
> keeping these drivers.
Thanks. We should still revisit all these separately and see which
ones are used on more modern RDC/Vortex86 chips, as the relation
between the brands isn't well documented.
I found an older lspci output from Xcore86MX/Vortex86MX showing
that is uses an RDC R6021/R6036 bridge instead of R6020/R6030
on the RDC321x:
https://lore.kernel.org/all/4CC80AF3.9040708@croler.net/
The Vortex86DX (585tsc compatible) datasheet in turn lists
an R6021/R6031, which means the driver won't work out of the box,
but it's probably not far off either if someone just adds
the PCI ID.
Clearly nobody has done that so far, which would indicate that
not a lot of people run vortex86 /and/ realize it's related
to rdc321x.
> and I've switched the watchdog driver over to X86_32:
>
> config RDC321X_WDT
> tristate "RDC R-321x SoC watchdog"
> depends on X86_32 || COMPILE_TEST
How about 'CPU_SUP_VORTEX_32 || COMPILE_TEST'?
> There's also no harm in keeping the southbridge reboot quirk I suppose,
> so I've dropped this as well:
>
> x86/reboot: Remove the RDC321X reboot quirk
Right. Same thing here: the code probably still works on later
R603x south bridges, but only triggers on the R6030 PCI ID
that is not used on supported chips. Most likely nothing
needs it.
Arnd
* Ingo Molnar <mingo@kernel.org> wrote: > The patches most relevant to this discussion should be: > > x86/cpu: Remove M486/M486SX/ELAN support > ... > x86/cpu: Remove TSC-less CONFIG_M586 support > x86/cpu: Make CONFIG_X86_TSC unconditional > x86/cpu: Make CONFIG_X86_CX8 unconditional > x86/percpu: Remove !CONFIG_X86_CX8 methods > x86/atomics: Remove !CONFIG_X86_CX8 methods also: > x86/cpu: Remove TSC-less CONFIG_M586 support Thanks, Ingo
On April 24, 2025 9:07:50 AM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Thu, 24 Apr 2025 at 01:01, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> Thanks for confirming. So a 486-targeted kernel still passes
>> your tests on modern hardware if we force TSC and CX8 to
>> be enabled, but the boot fails if the options are turned
>> off in Kconfig (though available in emulated hardware).
>
>I wouldn't expect CX8 to really matter - it causes us to generate
>extra code to pick one over the other, but on modern hardware we'll
>still always then dynamically pick the cmpxchg8b instruction.
>
>Could it trigger bugs in our alternatives, or some miscompilation due
>to the extra complexity? Sure. But it does sound unlikely.
>
>> To be completely sure, you could re-run the same test with
>> just one of these enabled, but I'm rather sure that the TSC
>> is the root cause.
>
>Agreed.
>
>Particularly when the lockup is then in timekeeping_notify() during
>the initial initcalls -> clocksource_select(), I'm pretty sure this is
>purely about TSC.
>
>That said, maybe the problem is in the watchdog logic, because
>clocksource_done_booting() is what starts the watchdog thread .
>
>So it might be the watchdog code itself that then gets confused
>(because of some "don't use tsc" case that never gets any testing in
>real life) and triggers immediately - and then points the finger at
>the clocksource code only because that's what is still running.
>
>Because CONFIG_X86_TSC does cause some oddities: we end up still
>*using* the TSC for many things if the hardware supports it (which
>modern hardware obviously does), but then other things get disabled
>entirely.
>
>For example, this:
>
> /*
> * Boot-time check whether the TSCs are synchronized across
> * all CPUs/cores:
> */
> #ifdef CONFIG_X86_TSC
> extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
> extern void tsc_verify_tsc_adjust(bool resume);
> extern void check_tsc_sync_target(void);
> #else
> static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) {
>return false; }
> static inline void tsc_verify_tsc_adjust(bool resume) { }
> static inline void check_tsc_sync_target(void) { }
> #endif
>
>So that tsc_store_and_check_tsc_adjust() thing etc never gets run,
>even though we actually *do* use TSC for get_cycles() and friends,
>because *that* code checks the runtime status too:
>
>Now, none of that should matter - because all *those* things are about
>details that simply aren't relevant for any of this case - but maybe
>there is some other situation that has similar "I'm actually using the
>TSC through get_cycles(), but I didn't do some setup because X86_TSC
>wasn't on.."
>
>I really get the feeling that it's time to leave i486 support behind.
>There's zero real reason for anybody to waste one second of
>development effort on this kind of issue.
>
> Linus
Well, isn't the whole point that his patches remove the cx8 fallback code?
© 2016 - 2026 Red Hat, Inc.