Hello,
by this commit, we see the config has below diff:
--- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config 2024-11-22 17:03:32.458344665 +0800
+++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config 2024-11-22 17:02:59.440805587 +0800
@@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
# end of BPF subsystem
CONFIG_PREEMPT_BUILD=y
-CONFIG_PREEMPT_NONE=y
+CONFIG_ARCH_HAS_PREEMPT_LAZY=y
+# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
+CONFIG_PREEMPT_LAZY=y
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
and we observed the issue does not always happen, but with high rate:
35772d627b55cc7f 476e8583ca16eecec0a3a28b6ee
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:47 70% 33:48 dmesg.RIP:torture_sched_setaffinity
:47 70% 33:48 dmesg.WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity
below is full report FYI.
kernel test robot noticed "WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity" on:
commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master fc39fb56917bb3cb53e99560ca3612a84456ada2]
[test failed on linux-next/master decc701f41d07481893fdea942c0ac6b226e84cd]
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: trivial
config: x86_64-randconfig-161-20241120
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202411252253.e39d77c6-lkp@intel.com
[ 150.797530][ T445] ------------[ cut here ]------------
[ 150.797915][ T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
[ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
[ 150.798909][ T445] Modules linked in: rcutorture torture ipmi_devintf ipmi_msghandler joydev crc32c_intel polyval_clmulni sha512_ssse3 sha256_ssse3 psmouse pcspkr input_leds led_class evdev drm drm_panel_orientation_quirks efi_pstore i2c_core dmi_sysfs qemu_fw_cfg aesni_intel nhc_udp nhc_routing nhc_mobility nhc_hop nhc_fragment nhc_dest
[ 150.800845][ T445] CPU: 0 UID: 0 PID: 445 Comm: rcu_torture_wri Tainted: G W 6.12.0-rc4-00032-g476e8583ca16 #1 2d4e54818569ec14d7868b657e3682f49dc47408
[ 150.801671][ T445] Tainted: [W]=WARN
[ 150.801886][ T445] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 150.802444][ T445] RIP: 0010:torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
[ 150.802909][ T445] Code: 11 06 00 75 e7 89 c1 48 89 45 f0 c6 05 9e 73 11 06 01 90 48 c7 c6 a0 07 0e 85 44 89 e2 48 c7 c7 00 d9 0d 85 e8 88 86 e0 ff 90 <0f> 0b 90 90 48 8b 45 f0 4c 8b 65 f8 c9 48 98 31 d2 31 c9 31 f6 31
All code
========
0: 11 06 adc %eax,(%rsi)
2: 00 75 e7 add %dh,-0x19(%rbp)
5: 89 c1 mov %eax,%ecx
7: 48 89 45 f0 mov %rax,-0x10(%rbp)
b: c6 05 9e 73 11 06 01 movb $0x1,0x611739e(%rip) # 0x61173b0
12: 90 nop
13: 48 c7 c6 a0 07 0e 85 mov $0xffffffff850e07a0,%rsi
1a: 44 89 e2 mov %r12d,%edx
1d: 48 c7 c7 00 d9 0d 85 mov $0xffffffff850dd900,%rdi
24: e8 88 86 e0 ff call 0xffffffffffe086b1
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 90 nop
2d: 90 nop
2e: 48 8b 45 f0 mov -0x10(%rbp),%rax
32: 4c 8b 65 f8 mov -0x8(%rbp),%r12
36: c9 leave
37: 48 98 cltq
39: 31 d2 xor %edx,%edx
3b: 31 c9 xor %ecx,%ecx
3d: 31 f6 xor %esi,%esi
3f: 31 .byte 0x31
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 90 nop
3: 90 nop
4: 48 8b 45 f0 mov -0x10(%rbp),%rax
8: 4c 8b 65 f8 mov -0x8(%rbp),%r12
c: c9 leave
d: 48 98 cltq
f: 31 d2 xor %edx,%edx
11: 31 c9 xor %ecx,%ecx
13: 31 f6 xor %esi,%esi
15: 31 .byte 0x31
[ 150.803951][ T445] RSP: 0018:ffffc90001d0fc50 EFLAGS: 00010246
[ 150.804295][ T445] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[ 150.804738][ T445] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 150.805173][ T445] RBP: ffffc90001d0fc60 R08: 0000000000000000 R09: 0000000000000000
[ 150.805664][ T445] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001bd
[ 150.806126][ T445] R13: ffffed10298bc638 R14: dffffc0000000000 R15: ffff88814c5e2c00
[ 150.806567][ T445] FS: 0000000000000000(0000) GS:ffff8883aee00000(0000) knlGS:0000000000000000
[ 150.807061][ T445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 150.807425][ T445] CR2: 00007fd03312b000 CR3: 0000000101aa7000 CR4: 00000000000406f0
[ 150.807863][ T445] Call Trace:
[ 150.808055][ T445] <TASK>
[ 150.808230][ T445] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 150.808482][ T445] ? __warn (kernel/panic.c:748)
[ 150.808722][ T445] ? torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
[ 150.809050][ T445] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 150.809325][ T445] ? handle_bug (arch/x86/kernel/traps.c:285)
[ 150.809631][ T445] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
[ 150.809907][ T445] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 150.810206][ T445] ? torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
[ 150.810536][ T445] synchronize_rcu_trivial (kernel/rcu/rcutorture.c:837) rcutorture
[ 150.811100][ T445] ? debug_smp_processor_id (lib/smp_processor_id.c:61)
[ 150.811407][ T445] ? rcu_torture_read_exit (kernel/rcu/rcutorture.c:832) rcutorture
[ 150.811996][ T445] do_rtws_sync+0x395/0x6e0 rcutorture
[ 150.812579][ T445] ? srcu_read_delay (kernel/rcu/rcutorture.c:1328) rcutorture
[ 150.813133][ T445] ? preempt_count_sub (kernel/sched/core.c:5867 kernel/sched/core.c:5863 kernel/sched/core.c:5885)
[ 150.813443][ T445] ? delay_tsc (arch/x86/include/asm/preempt.h:103 arch/x86/lib/delay.c:96)
[ 150.813697][ T445] rcu_torture_writer (kernel/rcu/rcutorture.c:1578) rcutorture
[ 150.814240][ T445] ? __kthread_parkme (include/linux/instrumented.h:68 include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/kthread.c:280)
[ 150.814524][ T445] ? reacquire_held_locks (kernel/locking/lockdep.c:5476)
[ 150.814841][ T445] ? rcu_torture_fwd_prog_cr (kernel/rcu/rcutorture.c:1364) rcutorture
[ 150.815419][ T445] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:103 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
[ 150.815756][ T445] ? __kthread_parkme (include/linux/instrumented.h:68 include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/kthread.c:280)
[ 150.816076][ T445] ? __kasan_check_read (mm/kasan/shadow.c:32)
[ 150.816368][ T445] ? __kthread_parkme (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/kthread.c:280)
[ 150.816648][ T445] ? trace_preempt_on (kernel/trace/trace_preemptirq.c:105)
[ 150.816936][ T445] ? rcu_torture_fwd_prog_cr (kernel/rcu/rcutorture.c:1364) rcutorture
[ 150.817541][ T445] kthread (kernel/kthread.c:389)
[ 150.817780][ T445] ? trace_sched_kthread_stop_ret (kernel/kthread.c:342)
[ 150.818127][ T445] ret_from_fork (arch/x86/kernel/process.c:153)
[ 150.818382][ T445] ? trace_sched_kthread_stop_ret (kernel/kthread.c:342)
[ 150.818730][ T445] ret_from_fork_asm (arch/x86/entry/entry_64.S:254)
[ 150.819022][ T445] </TASK>
[ 150.819201][ T445] irq event stamp: 230151
[ 150.819484][ T445] hardirqs last enabled at (230163): __up_console_sem (arch/x86/include/asm/irqflags.h:42 (discriminator 1) arch/x86/include/asm/irqflags.h:97 (discriminator 1) arch/x86/include/asm/irqflags.h:155 (discriminator 1) kernel/printk/printk.c:344 (discriminator 1))
[ 150.820008][ T445] hardirqs last disabled at (230172): __up_console_sem (kernel/printk/printk.c:342 (discriminator 1))
[ 150.820529][ T445] softirqs last enabled at (229700): handle_softirqs (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:401 kernel/softirq.c:582)
[ 150.821054][ T445] softirqs last disabled at (229651): irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637 kernel/softirq.c:627 kernel/softirq.c:649)
[ 150.821585][ T445] ---[ end trace 0000000000000000 ]---
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241125/202411252253.e39d77c6-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> Hello,
Hi,
> by this commit, we see the config has below diff:
>
> --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config 2024-11-22 17:03:32.458344665 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config 2024-11-22 17:02:59.440805587 +0800
> @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> # end of BPF subsystem
>
> CONFIG_PREEMPT_BUILD=y
> -CONFIG_PREEMPT_NONE=y
> +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> +# CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> +CONFIG_PREEMPT_LAZY=y
> # CONFIG_PREEMPT_RT is not set
> CONFIG_PREEMPT_COUNT=y
> CONFIG_PREEMPTION=y
>
…
> commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
…
> runtime: 300s
> test: cpuhotplug
> torture_type: trivial
…
> [ 150.797530][ T445] ------------[ cut here ]------------
> [ 150.797915][ T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
I've been staring at this, and this is actually fine. Your config changes
from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
while preemption is enabled and CPU-hotplug runs in the background. So
you get what you expect either by an attempt to move to a CPU which is
no longer valid or by getting migrated to another CPU in the middle of
your operation.
This is all fine. You need to update your config file or your test.
Sebastian
On Tue, Nov 26, 2024 at 03:30:24PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> > Hello,
> Hi,
>
> > by this commit, we see the config has below diff:
> >
> > --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config 2024-11-22 17:03:32.458344665 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config 2024-11-22 17:02:59.440805587 +0800
> > @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> > # end of BPF subsystem
> >
> > CONFIG_PREEMPT_BUILD=y
> > -CONFIG_PREEMPT_NONE=y
> > +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> > +# CONFIG_PREEMPT_NONE is not set
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > # CONFIG_PREEMPT is not set
> > +CONFIG_PREEMPT_LAZY=y
> > # CONFIG_PREEMPT_RT is not set
> > CONFIG_PREEMPT_COUNT=y
> > CONFIG_PREEMPTION=y
> >
> …
> > commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
> …
> > runtime: 300s
> > test: cpuhotplug
> > torture_type: trivial
> …
>
> > [ 150.797530][ T445] ------------[ cut here ]------------
> > [ 150.797915][ T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> > [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
>
> I've been staring at this, and this is actually fine. Your config changes
> from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
> CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
> while preemption is enabled and CPU-hotplug runs in the background. So
> you get what you expect either by an attempt to move to a CPU which is
> no longer valid or by getting migrated to another CPU in the middle of
> your operation.
>
> This is all fine. You need to update your config file or your test.
Good point, and I suspect that there will be a few rcutorture scenario
adjustments coming along shortly after -rc1 comes out.
Thanx, Paul
hi, Sebastian Andrzej Siewior, hi, Paul,
On Wed, Nov 27, 2024 at 12:17:49AM +0800, Paul E. McKenney wrote:
> On Tue, Nov 26, 2024 at 03:30:24PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> > > Hello,
> > Hi,
> >
> > > by this commit, we see the config has below diff:
> > >
> > > --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config 2024-11-22 17:03:32.458344665 +0800
> > > +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config 2024-11-22 17:02:59.440805587 +0800
> > > @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> > > # end of BPF subsystem
> > >
> > > CONFIG_PREEMPT_BUILD=y
> > > -CONFIG_PREEMPT_NONE=y
> > > +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> > > +# CONFIG_PREEMPT_NONE is not set
> > > # CONFIG_PREEMPT_VOLUNTARY is not set
> > > # CONFIG_PREEMPT is not set
> > > +CONFIG_PREEMPT_LAZY=y
> > > # CONFIG_PREEMPT_RT is not set
> > > CONFIG_PREEMPT_COUNT=y
> > > CONFIG_PREEMPTION=y
> > >
> > …
> > > commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
> > …
> > > runtime: 300s
> > > test: cpuhotplug
> > > torture_type: trivial
> > …
> >
> > > [ 150.797530][ T445] ------------[ cut here ]------------
> > > [ 150.797915][ T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> > > [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3))
> >
> > I've been staring at this, and this is actually fine. Your config changes
> > from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
> > CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
> > while preemption is enabled and CPU-hotplug runs in the background. So
> > you get what you expect either by an attempt to move to a CPU which is
> > no longer valid or by getting migrated to another CPU in the middle of
> > your operation.
> >
> > This is all fine. You need to update your config file or your test.
>
> Good point, and I suspect that there will be a few rcutorture scenario
> adjustments coming along shortly after -rc1 comes out.
thanks a lot for information!
we will fix our config and to avoid reporting against this commit.
>
> Thanx, Paul
© 2016 - 2026 Red Hat, Inc.