[linus:master] [sched, x86] 476e8583ca: WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity

kernel test robot posted 1 patch 1 year, 2 months ago
[linus:master] [sched, x86] 476e8583ca: WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity
Posted by kernel test robot 1 year, 2 months ago


Hello,


by this commit, we see the config has below diff:

--- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config   2024-11-22 17:03:32.458344665 +0800
+++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config   2024-11-22 17:02:59.440805587 +0800
@@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
 # end of BPF subsystem

 CONFIG_PREEMPT_BUILD=y
-CONFIG_PREEMPT_NONE=y
+CONFIG_ARCH_HAS_PREEMPT_LAZY=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
 # CONFIG_PREEMPT is not set
+CONFIG_PREEMPT_LAZY=y
 # CONFIG_PREEMPT_RT is not set
 CONFIG_PREEMPT_COUNT=y
 CONFIG_PREEMPTION=y


and we observed the issue does not always happen, but with high rate:

35772d627b55cc7f 476e8583ca16eecec0a3a28b6ee
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :47          70%          33:48    dmesg.RIP:torture_sched_setaffinity
           :47          70%          33:48    dmesg.WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity


below is full report FYI.


kernel test robot noticed "WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity" on:

commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      fc39fb56917bb3cb53e99560ca3612a84456ada2]
[test failed on linux-next/master decc701f41d07481893fdea942c0ac6b226e84cd]

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: cpuhotplug
	torture_type: trivial



config: x86_64-randconfig-161-20241120
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202411252253.e39d77c6-lkp@intel.com


[  150.797530][  T445] ------------[ cut here ]------------
[  150.797915][  T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
[ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
[  150.798909][  T445] Modules linked in: rcutorture torture ipmi_devintf ipmi_msghandler joydev crc32c_intel polyval_clmulni sha512_ssse3 sha256_ssse3 psmouse pcspkr input_leds led_class evdev drm drm_panel_orientation_quirks efi_pstore i2c_core dmi_sysfs qemu_fw_cfg aesni_intel nhc_udp nhc_routing nhc_mobility nhc_hop nhc_fragment nhc_dest
[  150.800845][  T445] CPU: 0 UID: 0 PID: 445 Comm: rcu_torture_wri Tainted: G        W          6.12.0-rc4-00032-g476e8583ca16 #1 2d4e54818569ec14d7868b657e3682f49dc47408
[  150.801671][  T445] Tainted: [W]=WARN
[  150.801886][  T445] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 150.802444][ T445] RIP: 0010:torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
[ 150.802909][ T445] Code: 11 06 00 75 e7 89 c1 48 89 45 f0 c6 05 9e 73 11 06 01 90 48 c7 c6 a0 07 0e 85 44 89 e2 48 c7 c7 00 d9 0d 85 e8 88 86 e0 ff 90 <0f> 0b 90 90 48 8b 45 f0 4c 8b 65 f8 c9 48 98 31 d2 31 c9 31 f6 31
All code
========
   0:	11 06                	adc    %eax,(%rsi)
   2:	00 75 e7             	add    %dh,-0x19(%rbp)
   5:	89 c1                	mov    %eax,%ecx
   7:	48 89 45 f0          	mov    %rax,-0x10(%rbp)
   b:	c6 05 9e 73 11 06 01 	movb   $0x1,0x611739e(%rip)        # 0x61173b0
  12:	90                   	nop
  13:	48 c7 c6 a0 07 0e 85 	mov    $0xffffffff850e07a0,%rsi
  1a:	44 89 e2             	mov    %r12d,%edx
  1d:	48 c7 c7 00 d9 0d 85 	mov    $0xffffffff850dd900,%rdi
  24:	e8 88 86 e0 ff       	call   0xffffffffffe086b1
  29:	90                   	nop
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	90                   	nop
  2d:	90                   	nop
  2e:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
  32:	4c 8b 65 f8          	mov    -0x8(%rbp),%r12
  36:	c9                   	leave
  37:	48 98                	cltq
  39:	31 d2                	xor    %edx,%edx
  3b:	31 c9                	xor    %ecx,%ecx
  3d:	31 f6                	xor    %esi,%esi
  3f:	31                   	.byte 0x31

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	90                   	nop
   3:	90                   	nop
   4:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
   8:	4c 8b 65 f8          	mov    -0x8(%rbp),%r12
   c:	c9                   	leave
   d:	48 98                	cltq
   f:	31 d2                	xor    %edx,%edx
  11:	31 c9                	xor    %ecx,%ecx
  13:	31 f6                	xor    %esi,%esi
  15:	31                   	.byte 0x31
[  150.803951][  T445] RSP: 0018:ffffc90001d0fc50 EFLAGS: 00010246
[  150.804295][  T445] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[  150.804738][  T445] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  150.805173][  T445] RBP: ffffc90001d0fc60 R08: 0000000000000000 R09: 0000000000000000
[  150.805664][  T445] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001bd
[  150.806126][  T445] R13: ffffed10298bc638 R14: dffffc0000000000 R15: ffff88814c5e2c00
[  150.806567][  T445] FS:  0000000000000000(0000) GS:ffff8883aee00000(0000) knlGS:0000000000000000
[  150.807061][  T445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  150.807425][  T445] CR2: 00007fd03312b000 CR3: 0000000101aa7000 CR4: 00000000000406f0
[  150.807863][  T445] Call Trace:
[  150.808055][  T445]  <TASK>
[ 150.808230][ T445] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 150.808482][ T445] ? __warn (kernel/panic.c:748) 
[ 150.808722][ T445] ? torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
[ 150.809050][ T445] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
[ 150.809325][ T445] ? handle_bug (arch/x86/kernel/traps.c:285) 
[ 150.809631][ T445] ? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1)) 
[ 150.809907][ T445] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 150.810206][ T445] ? torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
[ 150.810536][ T445] synchronize_rcu_trivial (kernel/rcu/rcutorture.c:837) rcutorture
[ 150.811100][ T445] ? debug_smp_processor_id (lib/smp_processor_id.c:61) 
[ 150.811407][ T445] ? rcu_torture_read_exit (kernel/rcu/rcutorture.c:832) rcutorture
[ 150.811996][ T445] do_rtws_sync+0x395/0x6e0 rcutorture
[ 150.812579][ T445] ? srcu_read_delay (kernel/rcu/rcutorture.c:1328) rcutorture
[ 150.813133][ T445] ? preempt_count_sub (kernel/sched/core.c:5867 kernel/sched/core.c:5863 kernel/sched/core.c:5885) 
[ 150.813443][ T445] ? delay_tsc (arch/x86/include/asm/preempt.h:103 arch/x86/lib/delay.c:96) 
[ 150.813697][ T445] rcu_torture_writer (kernel/rcu/rcutorture.c:1578) rcutorture
[ 150.814240][ T445] ? __kthread_parkme (include/linux/instrumented.h:68 include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/kthread.c:280) 
[ 150.814524][ T445] ? reacquire_held_locks (kernel/locking/lockdep.c:5476) 
[ 150.814841][ T445] ? rcu_torture_fwd_prog_cr (kernel/rcu/rcutorture.c:1364) rcutorture
[ 150.815419][ T445] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:103 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) 
[ 150.815756][ T445] ? __kthread_parkme (include/linux/instrumented.h:68 include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/kthread.c:280) 
[ 150.816076][ T445] ? __kasan_check_read (mm/kasan/shadow.c:32) 
[ 150.816368][ T445] ? __kthread_parkme (arch/x86/include/asm/bitops.h:206 arch/x86/include/asm/bitops.h:238 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/kthread.c:280) 
[ 150.816648][ T445] ? trace_preempt_on (kernel/trace/trace_preemptirq.c:105) 
[ 150.816936][ T445] ? rcu_torture_fwd_prog_cr (kernel/rcu/rcutorture.c:1364) rcutorture
[ 150.817541][ T445] kthread (kernel/kthread.c:389) 
[ 150.817780][ T445] ? trace_sched_kthread_stop_ret (kernel/kthread.c:342) 
[ 150.818127][ T445] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 150.818382][ T445] ? trace_sched_kthread_stop_ret (kernel/kthread.c:342) 
[ 150.818730][ T445] ret_from_fork_asm (arch/x86/entry/entry_64.S:254) 
[  150.819022][  T445]  </TASK>
[  150.819201][  T445] irq event stamp: 230151
[ 150.819484][ T445] hardirqs last enabled at (230163): __up_console_sem (arch/x86/include/asm/irqflags.h:42 (discriminator 1) arch/x86/include/asm/irqflags.h:97 (discriminator 1) arch/x86/include/asm/irqflags.h:155 (discriminator 1) kernel/printk/printk.c:344 (discriminator 1)) 
[ 150.820008][ T445] hardirqs last disabled at (230172): __up_console_sem (kernel/printk/printk.c:342 (discriminator 1)) 
[ 150.820529][ T445] softirqs last enabled at (229700): handle_softirqs (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:401 kernel/softirq.c:582) 
[ 150.821054][ T445] softirqs last disabled at (229651): irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637 kernel/softirq.c:627 kernel/softirq.c:649) 
[  150.821585][  T445] ---[ end trace 0000000000000000 ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241125/202411252253.e39d77c6-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [linus:master] [sched, x86] 476e8583ca: WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity
Posted by Sebastian Andrzej Siewior 1 year, 2 months ago
On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> Hello,
Hi,

> by this commit, we see the config has below diff:
> 
> --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config   2024-11-22 17:03:32.458344665 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config   2024-11-22 17:02:59.440805587 +0800
> @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
>  # end of BPF subsystem
> 
>  CONFIG_PREEMPT_BUILD=y
> -CONFIG_PREEMPT_NONE=y
> +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> +# CONFIG_PREEMPT_NONE is not set
>  # CONFIG_PREEMPT_VOLUNTARY is not set
>  # CONFIG_PREEMPT is not set
> +CONFIG_PREEMPT_LAZY=y
>  # CONFIG_PREEMPT_RT is not set
>  CONFIG_PREEMPT_COUNT=y
>  CONFIG_PREEMPTION=y
> 
…
> commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
…
> 	runtime: 300s
> 	test: cpuhotplug
> 	torture_type: trivial
…

> [  150.797530][  T445] ------------[ cut here ]------------
> [  150.797915][  T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 

I've been staring at this, and this is actually fine. Your config changes
from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
while preemption is enabled and CPU-hotplug runs in the background. So
you get what you expect either by an attempt to move to a CPU which is
no longer valid or by getting migrated to another CPU in the middle of
your operation.

This is all fine. You need to update your config file or your test.

Sebastian
Re: [linus:master] [sched, x86] 476e8583ca: WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity
Posted by Paul E. McKenney 1 year, 2 months ago
On Tue, Nov 26, 2024 at 03:30:24PM +0100, Sebastian Andrzej Siewior wrote:
> On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> > Hello,
> Hi,
> 
> > by this commit, we see the config has below diff:
> > 
> > --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config   2024-11-22 17:03:32.458344665 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config   2024-11-22 17:02:59.440805587 +0800
> > @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> >  # end of BPF subsystem
> > 
> >  CONFIG_PREEMPT_BUILD=y
> > -CONFIG_PREEMPT_NONE=y
> > +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> > +# CONFIG_PREEMPT_NONE is not set
> >  # CONFIG_PREEMPT_VOLUNTARY is not set
> >  # CONFIG_PREEMPT is not set
> > +CONFIG_PREEMPT_LAZY=y
> >  # CONFIG_PREEMPT_RT is not set
> >  CONFIG_PREEMPT_COUNT=y
> >  CONFIG_PREEMPTION=y
> > 
> …
> > commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
> …
> > 	runtime: 300s
> > 	test: cpuhotplug
> > 	torture_type: trivial
> …
> 
> > [  150.797530][  T445] ------------[ cut here ]------------
> > [  150.797915][  T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> > [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
> 
> I've been staring at this, and this is actually fine. Your config changes
> from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
> CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
> while preemption is enabled and CPU-hotplug runs in the background. So
> you get what you expect either by an attempt to move to a CPU which is
> no longer valid or by getting migrated to another CPU in the middle of
> your operation.
> 
> This is all fine. You need to update your config file or your test.

Good point, and I suspect that there will be a few rcutorture scenario
adjustments coming along shortly after -rc1 comes out.

							Thanx, Paul
Re: [linus:master] [sched, x86] 476e8583ca: WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity
Posted by Oliver Sang 1 year, 2 months ago
hi, Sebastian Andrzej Siewior, hi, Paul,

On Wed, Nov 27, 2024 at 12:17:49AM +0800, Paul E. McKenney wrote:
> On Tue, Nov 26, 2024 at 03:30:24PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2024-11-25 22:29:09 [+0800], kernel test robot wrote:
> > > Hello,
> > Hi,
> > 
> > > by this commit, we see the config has below diff:
> > > 
> > > --- /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/35772d627b55cc7fb4f33bae57c564a25b3121a9/.config   2024-11-22 17:03:32.458344665 +0800
> > > +++ /pkg/linux/x86_64-randconfig-161-20241120/gcc-12/476e8583ca16eecec0a3a28b6ee7130f4e369389/.config   2024-11-22 17:02:59.440805587 +0800
> > > @@ -121,9 +121,11 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
> > >  # end of BPF subsystem
> > > 
> > >  CONFIG_PREEMPT_BUILD=y
> > > -CONFIG_PREEMPT_NONE=y
> > > +CONFIG_ARCH_HAS_PREEMPT_LAZY=y
> > > +# CONFIG_PREEMPT_NONE is not set
> > >  # CONFIG_PREEMPT_VOLUNTARY is not set
> > >  # CONFIG_PREEMPT is not set
> > > +CONFIG_PREEMPT_LAZY=y
> > >  # CONFIG_PREEMPT_RT is not set
> > >  CONFIG_PREEMPT_COUNT=y
> > >  CONFIG_PREEMPTION=y
> > > 
> > …
> > > commit: 476e8583ca16eecec0a3a28b6ee7130f4e369389 ("sched, x86: Enable Lazy preemption")
> > …
> > > 	runtime: 300s
> > > 	test: cpuhotplug
> > > 	torture_type: trivial
> > …
> > 
> > > [  150.797530][  T445] ------------[ cut here ]------------
> > > [  150.797915][  T445] torture_sched_setaffinity: sched_setaffinity(445) returned -22
> > > [ 150.798353][ T445] WARNING: CPU: 0 PID: 445 at kernel/rcu/update.c:535 torture_sched_setaffinity (kernel/rcu/update.c:535 (discriminator 3)) 
> > 
> > I've been staring at this, and this is actually fine. Your config changes
> > from CONFIG_PREEMPT_NONE to CONFIG_PREEMPT_LAZY which implies
> > CONFIG_PREEMPTION. The trivial RCU test there does sched_setaffinity()
> > while preemption is enabled and CPU-hotplug runs in the background. So
> > you get what you expect either by an attempt to move to a CPU which is
> > no longer valid or by getting migrated to another CPU in the middle of
> > your operation.
> > 
> > This is all fine. You need to update your config file or your test.
> 
> Good point, and I suspect that there will be a few rcutorture scenario
> adjustments coming along shortly after -rc1 comes out.

thanks a lot for information!

we will fix our config and to avoid reporting against this commit.

> 
> 							Thanx, Paul