[PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect

Jakub Kicinski posted 1 patch 1 month, 1 week ago
kernel/configs/debug.config | 1 +
1 file changed, 1 insertion(+)
[PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Jakub Kicinski 1 month, 1 week ago
Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
added CONFIG_PROVE_RCU_LIST=y to the common CI config,
but RCU_EXPERT is not set, and it's a dependency for
CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
indicate that it does catch bugs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
I'd be slightly tempted to still send it to Linux for v6.12.

CC: paulmck@kernel.org
CC: frederic@kernel.org
CC: neeraj.upadhyay@kernel.org
CC: joel@joelfernandes.org
CC: rcu@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: kees@kernel.org
CC: matttbe@kernel.org
---
 kernel/configs/debug.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
index 509ee703de15..20552f163930 100644
--- a/kernel/configs/debug.config
+++ b/kernel/configs/debug.config
@@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y
 #
 # RCU Debugging
 #
+CONFIG_RCU_EXPERT=y
 CONFIG_PROVE_RCU=y
 CONFIG_PROVE_RCU_LIST=y
 #
-- 
2.46.2
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Simon Horman 1 month, 1 week ago
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

I exercised this in conjunction with tools/testing/selftests/net/config
and the resulting .config is as described before and after.

Reviewed-by: Simon Horman <horms@kernel.org>
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Paul E. McKenney 1 month, 1 week ago
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> CC: paulmck@kernel.org
> CC: frederic@kernel.org
> CC: neeraj.upadhyay@kernel.org
> CC: joel@joelfernandes.org
> CC: rcu@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: kees@kernel.org
> CC: matttbe@kernel.org
> ---
>  kernel/configs/debug.config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
> index 509ee703de15..20552f163930 100644
> --- a/kernel/configs/debug.config
> +++ b/kernel/configs/debug.config
> @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y
>  #
>  # RCU Debugging
>  #
> +CONFIG_RCU_EXPERT=y
>  CONFIG_PROVE_RCU=y
>  CONFIG_PROVE_RCU_LIST=y
>  #
> -- 
> 2.46.2
>
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Matthieu Baerts 1 month, 1 week ago
Hi Jakub,

On 16/10/2024 03:11, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.

Good catch! I confirm it fixes the issue:

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> I'd be slightly tempted to still send it to Linux for v6.12.

Good idea, it sounds like a fix because I guess if it was on the list,
it was supposed to be used. But be careful it might detect quite a few
issues: I just enabled it on MPTCP tree, and it found issues.

It might be good to check the impact first, e.g. at least enabling it
when running the Networking tests. Please note that the CI didn't pick
up this patch, because it is marked for RCU (PATCH rcu):


https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/

If the impact is important, it might be better to target linux-next
first, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Jakub Kicinski 4 weeks, 1 day ago
On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote:
> https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/
> 
> If the impact is important, it might be better to target linux-next
> first, no?

Thanks for testing! I didn't anticipate it to be so effective.

Looks like it's not in -next, yet, and we got an Ack from Paul 
and co. so I'll toss it into net-next.
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Matthieu Baerts 4 weeks ago
Hi Jakub,

On 28/10/2024 18:22, Jakub Kicinski wrote:
> On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote:
>> https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/
>>
>> If the impact is important, it might be better to target linux-next
>> first, no?
> 
> Thanks for testing! I didn't anticipate it to be so effective.
> 
> Looks like it's not in -next, yet, and we got an Ack from Paul 
> and co. so I'll toss it into net-next.

Thanks!

Please note that the 3 issues I mentioned have been fixed somewhere:

- MPTCP: patches have been sent to the Netdev ML [1]
- Netfilter: patches have been sent to the Netfilter ML [2]
- perf (VM shutdown): patches have been applied in perf/urgent [3]

[1]
https://lore.kernel.org/20241021-net-mptcp-sched-lock-v1-1-637759cf061c@kernel.org
[2] https://lore.kernel.org/20241025133230.22491-2-fw@strlen.de
[3]
https://lore.kernel.org/20240913162340.2142976-1-kan.liang@linux.intel.com

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Matthieu Baerts 1 month, 1 week ago
On 16/10/2024 17:18, Matthieu Baerts wrote:
> Hi Jakub,
> 
> On 16/10/2024 03:11, Jakub Kicinski wrote:
>> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
>> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
>> but RCU_EXPERT is not set, and it's a dependency for
>> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
>> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
>> indicate that it does catch bugs.
> 
> Good catch! I confirm it fixes the issue:
> 
> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> 
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> Good idea, it sounds like a fix because I guess if it was on the list,
> it was supposed to be used. But be careful it might detect quite a few
> issues: I just enabled it on MPTCP tree, and it found issues.
Sorry, I forgot to mention that it found 3 issues: one specific to
MPTCP, one in Netfilter, but also one when shutting down the VM:


> # /usr/lib/klibc/bin/poweroff
> [ 2360.588763][T11825] 
> [ 2360.589006][T11825] =============================
> [ 2360.589424][T11825] WARNING: suspicious RCU usage
> [ 2360.589952][T11825] 6.12.0-rc2+ #1 Tainted: G                 N
> [ 2360.590522][T11825] -----------------------------
> [ 2360.590896][T11825] kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
> [ 2360.592341][T11825] 
> [ 2360.592341][T11825] other info that might help us debug this:
> [ 2360.592341][T11825] 
> [ 2360.593343][T11825] 
> [ 2360.593343][T11825] rcu_scheduler_active = 2, debug_locks = 1
> [ 2360.593951][T11825] 3 locks held by poweroff/11825:
> [2360.594481][T11825] #0: ffffffff89641e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770) 
> [2360.594997][T11825] #1: ffffffff8963eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388) 
> [2360.595859][T11825] #2: ffffffff898d1a68 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983) 
> [ 2360.596645][T11825] 
> [ 2360.596645][T11825] stack backtrace:
> [ 2360.597476][T11825] CPU: 3 UID: 0 PID: 11825 Comm: poweroff Tainted: G                 N 6.12.0-rc2+ #1
> [ 2360.597987][T11825] Tainted: [N]=TEST
> [ 2360.598291][T11825] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 2360.598890][T11825] Call Trace:
> [ 2360.599295][T11825]  <TASK>
> [2360.599568][T11825] dump_stack_lvl (lib/dump_stack.c:123) 
> [2360.600032][T11825] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) 
> [2360.600518][T11825] perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9)) 
> [2360.600972][T11825] ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939) 
> [2360.601607][T11825] ? acpi_execute_simple_method (drivers/acpi/utils.c:679) 
> [2360.601988][T11825] ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679) 
> [2360.602577][T11825] ? md_notify_reboot (drivers/md/md.c:9860) 
> [2360.603043][T11825] perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1)) 
> [2360.603543][T11825] perf_reboot (kernel/events/core.c:14066 (discriminator 3)) 
> [2360.603979][T11825] ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2)) 
> [2360.604454][T11825] notifier_call_chain (kernel/notifier.c:93) 
> [2360.604833][T11825] blocking_notifier_call_chain (kernel/notifier.c:389) 
> [2360.605522][T11825] kernel_power_off (kernel/reboot.c:294) 
> [2360.605908][T11825] __do_sys_reboot (kernel/reboot.c:771) 
> [2360.606320][T11825] ? __pfx___do_sys_reboot (kernel/reboot.c:717) 
> [2360.606739][T11825] ? __pfx_ksys_sync (fs/sync.c:98) 
> [2360.607113][T11825] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) 
> [2360.607551][T11825] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) 
> [ 2360.607911][T11825] RIP: 0033:0x20e505
> [ 2360.608320][T11825] Code: 48 89 77 38 c3 89 f0 48 8b 1f 48 8b 67 08 48 8b 6f 10 4c 8b 67 18 4c 8b 6f 20 4c 8b 77 28 4c 8b 7f 30 ff 67 38 49 89 ca 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 f7 d8 89 05 d6 8e 00 00 48 83 c8 ff c3
> All code
> ========
>    0:	48 89 77 38          	mov    %rsi,0x38(%rdi)
>    4:	c3                   	ret
>    5:	89 f0                	mov    %esi,%eax
>    7:	48 8b 1f             	mov    (%rdi),%rbx
>    a:	48 8b 67 08          	mov    0x8(%rdi),%rsp
>    e:	48 8b 6f 10          	mov    0x10(%rdi),%rbp
>   12:	4c 8b 67 18          	mov    0x18(%rdi),%r12
>   16:	4c 8b 6f 20          	mov    0x20(%rdi),%r13
>   1a:	4c 8b 77 28          	mov    0x28(%rdi),%r14
>   1e:	4c 8b 7f 30          	mov    0x30(%rdi),%r15
>   22:	ff 67 38             	jmp    *0x38(%rdi)
>   25:	49 89 ca             	mov    %rcx,%r10
>   28:	0f 05                	syscall
>   2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>   30:	73 01                	jae    0x33
>   32:	c3                   	ret
>   33:	f7 d8                	neg    %eax
>   35:	89 05 d6 8e 00 00    	mov    %eax,0x8ed6(%rip)        # 0x8f11
>   3b:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   3f:	c3                   	ret
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>    6:	73 01                	jae    0x9
>    8:	c3                   	ret
>    9:	f7 d8                	neg    %eax
>    b:	89 05 d6 8e 00 00    	mov    %eax,0x8ed6(%rip)        # 0x8ee7
>   11:	48 83 c8 ff          	or     $0xffffffffffffffff,%rax
>   15:	c3                   	ret
> [ 2360.609571][T11825] RSP: 002b:00007ffd898b0d58 EFLAGS: 00000217 ORIG_RAX: 00000000000000a9
> [ 2360.610171][T11825] RAX: ffffffffffffffda RBX: 000000004321fedc RCX: 000000000020e505
> [ 2360.610763][T11825] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
> [ 2360.611461][T11825] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000213020
> [ 2360.611987][T11825] R10: 0000000000000000 R11: 0000000000000217 R12: 00007ffd898b0db8
> [ 2360.612583][T11825] R13: 0000000000401000 R14: 0000000000000000 R15: 0000000000000000
> [ 2360.613142][T11825]  </TASK>
> [ 2360.615641][T11825] ACPI: PM: Preparing to enter system sleep state S5
> [ 2360.617134][T11825] kvm: exiting hardware virtualization
> [ 2360.617715][T11825] reboot: Power down


I can easily reproduce this one: this means that all "debug" VMs will
report issues!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect
Posted by Joel Fernandes 1 month, 1 week ago
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote:
> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup")
> added CONFIG_PROVE_RCU_LIST=y to the common CI config,
> but RCU_EXPERT is not set, and it's a dependency for
> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage
> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking
> indicate that it does catch bugs.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

> ---
> I'd be slightly tempted to still send it to Linux for v6.12.
> 
> CC: paulmck@kernel.org
> CC: frederic@kernel.org
> CC: neeraj.upadhyay@kernel.org
> CC: joel@joelfernandes.org
> CC: rcu@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: kees@kernel.org
> CC: matttbe@kernel.org
> ---
>  kernel/configs/debug.config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config
> index 509ee703de15..20552f163930 100644
> --- a/kernel/configs/debug.config
> +++ b/kernel/configs/debug.config
> @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y
>  #
>  # RCU Debugging
>  #
> +CONFIG_RCU_EXPERT=y
>  CONFIG_PROVE_RCU=y
>  CONFIG_PROVE_RCU_LIST=y
>  #
> -- 
> 2.46.2
>