[tip: perf/core] perf: Fix event_function_call() locking

tip-bot2 for Peter Zijlstra posted 1 patch 1 year, 6 months ago
kernel/events/core.c |  9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
[tip: perf/core] perf: Fix event_function_call() locking
Posted by tip-bot2 for Peter Zijlstra 1 year, 6 months ago
The following commit has been merged into the perf/core branch of tip:

Commit-ID:     558abc7e3f895049faa46b08656be4c60dc6e9fd
Gitweb:        https://git.kernel.org/tip/558abc7e3f895049faa46b08656be4c60dc6e9fd
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 07 Aug 2024 13:29:27 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 08 Aug 2024 12:27:31 +02:00

perf: Fix event_function_call() locking

All the event_function/@func call context already uses perf_ctx_lock()
except for the !ctx->is_active case. Make it all consistent.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240807115550.138301094@infradead.org
---
 kernel/events/core.c |  9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index eb03c9a..ab49dea 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -263,6 +263,7 @@ unlock:
 static void event_function_call(struct perf_event *event, event_f func, void *data)
 {
 	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
 	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
 	struct event_function_struct efs = {
 		.event = event,
@@ -291,22 +292,22 @@ again:
 	if (!task_function_call(task, event_function, &efs))
 		return;
 
-	raw_spin_lock_irq(&ctx->lock);
+	perf_ctx_lock(cpuctx, ctx);
 	/*
 	 * Reload the task pointer, it might have been changed by
 	 * a concurrent perf_event_context_sched_out().
 	 */
 	task = ctx->task;
 	if (task == TASK_TOMBSTONE) {
-		raw_spin_unlock_irq(&ctx->lock);
+		perf_ctx_unlock(cpuctx, ctx);
 		return;
 	}
 	if (ctx->is_active) {
-		raw_spin_unlock_irq(&ctx->lock);
+		perf_ctx_unlock(cpuctx, ctx);
 		goto again;
 	}
 	func(event, NULL, ctx, data);
-	raw_spin_unlock_irq(&ctx->lock);
+	perf_ctx_unlock(cpuctx, ctx);
 }
 
 /*
Re: [tip: perf/core] perf: Fix event_function_call() locking
Posted by Pengfei Xu 1 year, 5 months ago
Hi Peter and perf experts,

There is "BUG: using smp_processor_id() in preemptible code in
event_function_call" in v6.11-rc3 Intel internal with linux-next patches in
syzkaller testing.
And this issue was reproduced in next-20240812 also.

Bisected and found related commit in Intel internal repo:
558abc7e3f89 perf: Fix event_function_call() locking

After revert above commit on top of Intel internal kernel, this issue
was gone.

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240812_194918_event_function_call
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.c
Syzkaller repro syscall steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.prog
Syzkaller analysis report: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/repro.report
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/bisect_info.log
Issue bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240812_194918_event_function_call/bzImage_dcdef334e8852ca433e908db67744965101f3fc1.tar.gz
Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240812_194918_event_function_call/dcdef334e8852ca433e908db67744965101f3fc1_dmesg.log

"
[   17.071069] BUG: using smp_processor_id() in preemptible [00000000] code: repro/726
[   17.071875] caller is debug_smp_processor_id+0x20/0x30
[   17.072369] CPU: 0 UID: 0 PID: 726 Comm: repro Not tainted 6.11.0-rc3-next-20240812-dcdef334e885+ #1
[   17.073225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   17.074205] Call Trace:
[   17.074436]  <TASK>
[   17.074646]  dump_stack_lvl+0x121/0x150
[   17.075005]  dump_stack+0x19/0x20
[   17.075313]  check_preemption_disabled+0x168/0x180
[   17.075753]  debug_smp_processor_id+0x20/0x30
[   17.076156]  event_function_call+0xd7/0x5c0
[   17.076520]  ? __pfx___perf_remove_from_context+0x10/0x10
[   17.076986]  ? __this_cpu_preempt_check+0x21/0x30
[   17.077404]  ? __pfx_event_function_call+0x10/0x10
[   17.077850]  ? __this_cpu_preempt_check+0x21/0x30
[   17.078276]  ? _raw_spin_unlock_irq+0x2c/0x60
[   17.078681]  ? lockdep_hardirqs_on+0x89/0x110
[   17.079084]  perf_remove_from_context+0xf7/0x1d0
[   17.079506]  perf_event_release_kernel+0x186/0x870
[   17.079944]  ? lock_is_held_type+0xef/0x150
[   17.080326]  ? locks_remove_file+0x3bb/0x5e0
[   17.080727]  ? __pfx_perf_event_release_kernel+0x10/0x10
[   17.081202]  ? __this_cpu_preempt_check+0x21/0x30
[   17.081634]  ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
[   17.082116]  ? evm_file_release+0x193/0x1f0
[   17.082502]  ? __pfx_perf_release+0x10/0x10
[   17.082880]  perf_release+0x40/0x60
[   17.083203]  __fput+0x426/0xbc0
[   17.083509]  ____fput+0x1f/0x30
[   17.083806]  task_work_run+0x19c/0x2b0
[   17.084159]  ? __pfx_task_work_run+0x10/0x10
[   17.084553]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[   17.085036]  ? switch_task_namespaces+0xc6/0x110
[   17.085463]  do_exit+0xb19/0x2aa0
[   17.085778]  ? lock_release+0x441/0x870
[   17.086138]  ? __pfx_do_exit+0x10/0x10
[   17.086485]  ? __this_cpu_preempt_check+0x21/0x30
[   17.087007]  ? _raw_spin_unlock_irq+0x2c/0x60
[   17.087219]  ? lockdep_hardirqs_on+0x89/0x110
[   17.087426]  do_group_exit+0xe4/0x2c0
[   17.087607]  __x64_sys_exit_group+0x4d/0x60
[   17.087799]  x64_sys_call+0x20c4/0x20d0
[   17.087974]  do_syscall_64+0x6d/0x140
[   17.088144]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   17.088374] RIP: 0033:0x7f6c5a518a4d
[   17.088541] Code: Unable to access opcode bytes at 0x7f6c5a518a23.
[   17.088832] RSP: 002b:00007fffca44fe68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   17.089166] RAX: ffffffffffffffda RBX: 00007f6c5a5f69e0 RCX: 00007f6c5a518a4d
[   17.089476] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000
[   17.089790] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[   17.090100] R10: 00007fffca44fd10 R11: 0000000000000246 R12: 00007f6c5a5f69e0
[   17.090411] R13: 00007f6c5a5fbf00 R14: 0000000000000001 R15: 00007f6c5a5fbee8
[   17.090740]  </TASK>
[   17.162307] BUG: using smp_processor_id() in preemptible [00000000] code: repro/727
[   17.163284] caller is debug_smp_processor_id+0x20/0x30
[   17.163765] CPU: 0 UID: 0 PID: 727 Comm: repro Not tainted 6.11.0-rc3-next-20240812-dcdef334e885+ #1
[   17.164571] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   17.165606] Call Trace:
[   17.165840]  <TASK>
[   17.166045]  dump_stack_lvl+0x121/0x150
[   17.166407]  dump_stack+0x19/0x20
[   17.166732]  check_preemption_disabled+0x168/0x180
[   17.166944]  debug_smp_processor_id+0x20/0x30
[   17.167153]  event_function_call+0xd7/0x5c0
[   17.167355]  ? __pfx___perf_remove_from_context+0x10/0x10
[   17.167611]  ? __this_cpu_preempt_check+0x21/0x30
[   17.167835]  ? __pfx_event_function_call+0x10/0x10
[   17.168072]  ? __this_cpu_preempt_check+0x21/0x30
[   17.168297]  ? _raw_spin_unlock_irq+0x2c/0x60
[   17.168505]  ? lockdep_hardirqs_on+0x89/0x110
[   17.168718]  perf_remove_from_context+0xf7/0x1d0
[   17.168939]  perf_event_release_kernel+0x186/0x870
[   17.169164]  ? lock_is_held_type+0xef/0x150
[   17.169362]  ? locks_remove_file+0x3bb/0x5e0
[   17.169546]  ? __pfx_perf_event_release_kernel+0x10/0x10
[   17.169791]  ? __this_cpu_preempt_check+0x21/0x30
[   17.169994]  ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
[   17.170219]  ? evm_file_release+0x193/0x1f0
[   17.170421]  ? __pfx_perf_release+0x10/0x10
[   17.170620]  perf_release+0x40/0x60
[   17.170790]  __fput+0x426/0xbc0
[   17.170938]  ____fput+0x1f/0x30
[   17.171082]  task_work_run+0x19c/0x2b0
[   17.171271]  ? __pfx_task_work_run+0x10/0x10
[   17.171478]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[   17.171732]  ? switch_task_namespaces+0xc6/0x110
[   17.171957]  do_exit+0xb19/0x2aa0
[   17.172122]  ? lock_release+0x441/0x870
[   17.172314]  ? __pfx_do_exit+0x10/0x10
[   17.172476]  ? __this_cpu_preempt_check+0x21/0x30
[   17.172700]  ? _raw_spin_unlock_irq+0x2c/0x60
[   17.172910]  ? lockdep_hardirqs_on+0x89/0x110
[   17.173126]  do_group_exit+0xe4/0x2c0
[   17.173307]  __x64_sys_exit_group+0x4d/0x60
[   17.173509]  x64_sys_call+0x20c4/0x20d0
[   17.173679]  do_syscall_64+0x6d/0x140
[   17.173854]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   17.174091] RIP: 0033:0x7fc803b18a4d
[   17.174245] Code: Unable to access opcode bytes at 0x7fc803b18a23.
[   17.174522] RSP: 002b:00007ffd949207a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   17.174892] RAX: ffffffffffffffda RBX: 00007fc803bf69e0 RCX: 00007fc803b18a4d
[   17.175208] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000
[   17.175526] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[   17.175849] R10: 00007ffd94920650 R11: 0000000000000246 R12: 00007fc803bf69e0
[   17.176170] R13: 00007fc803bfbf00 R14: 0000000000000001 R15: 00007fc803bfbee8
[   17.176508]  </TASK>
"

I hope it's helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Best Regards,
Thanks!


On 2024-08-08 at 10:32:08 -0000, tip-bot2 for Peter Zijlstra wrote:
> The following commit has been merged into the perf/core branch of tip:
> 
> Commit-ID:     558abc7e3f895049faa46b08656be4c60dc6e9fd
> Gitweb:        https://git.kernel.org/tip/558abc7e3f895049faa46b08656be4c60dc6e9fd
> Author:        Peter Zijlstra <peterz@infradead.org>
> AuthorDate:    Wed, 07 Aug 2024 13:29:27 +02:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Thu, 08 Aug 2024 12:27:31 +02:00
> 
> perf: Fix event_function_call() locking
> 
> All the event_function/@func call context already uses perf_ctx_lock()
> except for the !ctx->is_active case. Make it all consistent.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Reviewed-by: Namhyung Kim <namhyung@kernel.org>
> Link: https://lore.kernel.org/r/20240807115550.138301094@infradead.org
> ---
>  kernel/events/core.c |  9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index eb03c9a..ab49dea 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -263,6 +263,7 @@ unlock:
>  static void event_function_call(struct perf_event *event, event_f func, void *data)
>  {
>  	struct perf_event_context *ctx = event->ctx;
> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
>  	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
>  	struct event_function_struct efs = {
>  		.event = event,
> @@ -291,22 +292,22 @@ again:
>  	if (!task_function_call(task, event_function, &efs))
>  		return;
>  
> -	raw_spin_lock_irq(&ctx->lock);
> +	perf_ctx_lock(cpuctx, ctx);
>  	/*
>  	 * Reload the task pointer, it might have been changed by
>  	 * a concurrent perf_event_context_sched_out().
>  	 */
>  	task = ctx->task;
>  	if (task == TASK_TOMBSTONE) {
> -		raw_spin_unlock_irq(&ctx->lock);
> +		perf_ctx_unlock(cpuctx, ctx);
>  		return;
>  	}
>  	if (ctx->is_active) {
> -		raw_spin_unlock_irq(&ctx->lock);
> +		perf_ctx_unlock(cpuctx, ctx);
>  		goto again;
>  	}
>  	func(event, NULL, ctx, data);
> -	raw_spin_unlock_irq(&ctx->lock);
> +	perf_ctx_unlock(cpuctx, ctx);
>  }
>  
>  /*
Re: [tip: perf/core] perf: Fix event_function_call() locking
Posted by Naresh Kamboju 1 year, 5 months ago
While running LTP test cases splice07 and perf_event_open01 found following
kernel BUG running on arm64 device juno-r2 and qemu-arm64 on the Linux
next-20240812 and next-20240813 tag.

  GOOD: next-20240809
  BAD: next-20240812

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

Test log:
--------
[ 2278.760258] check_preemption_disabled: 15 callbacks suppressed
[ 2278.760282] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076
[ 2278.775032] caller is debug_smp_processor_id+0x20/0x30
[ 2278.780270] CPU: 5 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1
[ 2278.789344] Hardware name: ARM Juno development board (r2) (DT)
[ 2278.795276] Call trace:
[ 2278.797724]  dump_backtrace+0x9c/0x128
[ 2278.801487]  show_stack+0x20/0x38
[ 2278.804812]  dump_stack_lvl+0xbc/0xd0
[ 2278.808487]  dump_stack+0x18/0x28
[ 2278.811811]  check_preemption_disabled+0xd8/0xf8
[ 2278.816446]  debug_smp_processor_id+0x20/0x30
[ 2278.820818]  event_function_call+0x54/0x168
[ 2278.825015]  _perf_event_enable+0x78/0xa8
[ 2278.829037]  perf_event_for_each_child+0x44/0xa0
[ 2278.833672]  _perf_ioctl+0x1bc/0xae0
[ 2278.837262]  perf_ioctl+0x58/0x90
[ 2278.840590]  __arm64_sys_ioctl+0xb4/0x100
[ 2278.844615]  invoke_syscall+0x50/0x120
[ 2278.848381]  el0_svc_common.constprop.0+0x48/0xf0
[ 2278.853103]  do_el0_svc+0x24/0x38
[ 2278.856432]  el0_svc+0x3c/0x108
[ 2278.859585]  el0t_64_sync_handler+0x120/0x130
[ 2278.863956]  el0t_64_sync+0x190/0x198
[ 2279.068732] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076
[ 2279.077570] caller is debug_smp_processor_id+0x20/0x30
[ 2279.082754] CPU: 1 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1
[ 2279.091823] Hardware name: ARM Juno development board (r2) (DT)

Full test log:
---------
 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/testrun/24833616/suite/log-parser-test/test/check-kernel-bug/log
 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/testrun/24833616/suite/log-parser-test/tests/
 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240812/testrun/24821160/suite/log-parser-test/test/check-kernel-bug-483bde618da4ec98e33eefb5e26adeb267f80cc2461569605f3166ce12b3fe82/log

metadata:
  artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/
  build-url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/
  build_name: gcc-13-lkftconfig-debug
  git_describe: next-20240812
  git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git_sha: 9e6869691724b12e1f43655eeedc35fade38120c
  kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/config
  kernel_version: 6.11.0-rc3
  toolchain: gcc-13

--
Linaro LKFT
https://lkft.linaro.org
Re: [tip: perf/core] perf: Fix event_function_call() locking
Posted by Namhyung Kim 1 year, 5 months ago
Hello,

On Tue, Aug 13, 2024 at 08:49:59PM +0530, Naresh Kamboju wrote:
> While running LTP test cases splice07 and perf_event_open01 found following
> kernel BUG running on arm64 device juno-r2 and qemu-arm64 on the Linux
> next-20240812 and next-20240813 tag.
> 
>   GOOD: next-20240809
>   BAD: next-20240812
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> 
> Test log:
> --------
> [ 2278.760258] check_preemption_disabled: 15 callbacks suppressed
> [ 2278.760282] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076
> [ 2278.775032] caller is debug_smp_processor_id+0x20/0x30
> [ 2278.780270] CPU: 5 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1
> [ 2278.789344] Hardware name: ARM Juno development board (r2) (DT)
> [ 2278.795276] Call trace:
> [ 2278.797724]  dump_backtrace+0x9c/0x128
> [ 2278.801487]  show_stack+0x20/0x38
> [ 2278.804812]  dump_stack_lvl+0xbc/0xd0
> [ 2278.808487]  dump_stack+0x18/0x28
> [ 2278.811811]  check_preemption_disabled+0xd8/0xf8
> [ 2278.816446]  debug_smp_processor_id+0x20/0x30
> [ 2278.820818]  event_function_call+0x54/0x168
> [ 2278.825015]  _perf_event_enable+0x78/0xa8
> [ 2278.829037]  perf_event_for_each_child+0x44/0xa0
> [ 2278.833672]  _perf_ioctl+0x1bc/0xae0
> [ 2278.837262]  perf_ioctl+0x58/0x90
> [ 2278.840590]  __arm64_sys_ioctl+0xb4/0x100
> [ 2278.844615]  invoke_syscall+0x50/0x120
> [ 2278.848381]  el0_svc_common.constprop.0+0x48/0xf0
> [ 2278.853103]  do_el0_svc+0x24/0x38
> [ 2278.856432]  el0_svc+0x3c/0x108
> [ 2278.859585]  el0t_64_sync_handler+0x120/0x130
> [ 2278.863956]  el0t_64_sync+0x190/0x198
> [ 2279.068732] BUG: using smp_processor_id() in preemptible [00000000] code: perf_event_open/111076
> [ 2279.077570] caller is debug_smp_processor_id+0x20/0x30
> [ 2279.082754] CPU: 1 UID: 0 PID: 111076 Comm: perf_event_open Not tainted 6.11.0-rc3-next-20240812 #1
> [ 2279.091823] Hardware name: ARM Juno development board (r2) (DT)
> 
> Full test log:
> ---------
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/testrun/24833616/suite/log-parser-test/test/check-kernel-bug/log
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240813/testrun/24833616/suite/log-parser-test/tests/
>  - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240812/testrun/24821160/suite/log-parser-test/test/check-kernel-bug-483bde618da4ec98e33eefb5e26adeb267f80cc2461569605f3166ce12b3fe82/log
> 
> metadata:
>   artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/
>   build-url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/
>   build_name: gcc-13-lkftconfig-debug
>   git_describe: next-20240812
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_sha: 9e6869691724b12e1f43655eeedc35fade38120c
>   kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2kXsz6nJO7pJ1nL4xGlKHYhiLx9/config
>   kernel_version: 6.11.0-rc3
>   toolchain: gcc-13

Thanks for the report, can you please check if it solves the problem?

Thanks,
Namhyung

---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9893ba5e98aa..85204c2376fa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -298,13 +298,14 @@ static int event_function(void *info)
 static void event_function_call(struct perf_event *event, event_f func, void *data)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+	struct perf_cpu_context *cpuctx;
 	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
 	struct event_function_struct efs = {
 		.event = event,
 		.func = func,
 		.data = data,
 	};
+	unsigned long flags;
 
 	if (!event->parent) {
 		/*
@@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da
 	if (!task_function_call(task, event_function, &efs))
 		return;
 
+	local_irq_save(flags);
+	cpuctx = this_cpu_ptr(&perf_cpu_context);
+
 	perf_ctx_lock(cpuctx, ctx);
 	/*
 	 * Reload the task pointer, it might have been changed by
 	 * a concurrent perf_event_context_sched_out().
 	 */
 	task = ctx->task;
-	if (task == TASK_TOMBSTONE) {
-		perf_ctx_unlock(cpuctx, ctx);
-		return;
-	}
+	if (task == TASK_TOMBSTONE)
+		goto out;
+
 	if (ctx->is_active) {
 		perf_ctx_unlock(cpuctx, ctx);
+		local_irq_restore(flags);
 		goto again;
 	}
 	func(event, NULL, ctx, data);
+out:
 	perf_ctx_unlock(cpuctx, ctx);
+	local_irq_restore(flags);
 }
 
 /*
Re: [tip: perf/core] perf: Fix event_function_call() locking
Posted by Peter Zijlstra 1 year, 5 months ago
On Tue, Aug 13, 2024 at 11:28:54AM -0700, Namhyung Kim wrote:

Duh, yeah.

> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9893ba5e98aa..85204c2376fa 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -298,13 +298,14 @@ static int event_function(void *info)
>  static void event_function_call(struct perf_event *event, event_f func, void *data)
>  {
>  	struct perf_event_context *ctx = event->ctx;
> -	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> +	struct perf_cpu_context *cpuctx;
>  	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
>  	struct event_function_struct efs = {
>  		.event = event,
>  		.func = func,
>  		.data = data,
>  	};
> +	unsigned long flags;
>  
>  	if (!event->parent) {
>  		/*
> @@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da
>  	if (!task_function_call(task, event_function, &efs))
>  		return;
>  
> +	local_irq_save(flags);

This can just be local_irq_disable() though, seeing how the fingered
commit replaced raw_spin_lock_irq().

I'll queue the below...

---
Subject: perf: Really fix event_function_call() locking
From: Namhyung Kim <namhyung@kernel.org>
Date: Tue Aug 13 22:55:11 CEST 2024

Commit 558abc7e3f89 ("perf: Fix event_function_call() locking") lost
IRQ disabling by mistake.

Fixes: 558abc7e3f89 ("perf: Fix event_function_call() locking")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -298,8 +298,8 @@ static int event_function(void *info)
 static void event_function_call(struct perf_event *event, event_f func, void *data)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
 	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
+	struct perf_cpu_context *cpuctx;
 	struct event_function_struct efs = {
 		.event = event,
 		.func = func,
@@ -327,22 +327,25 @@ static void event_function_call(struct p
 	if (!task_function_call(task, event_function, &efs))
 		return;
 
+	local_irq_disable();
+	cpuctx = this_cpu_ptr(&perf_cpu_context);
 	perf_ctx_lock(cpuctx, ctx);
 	/*
 	 * Reload the task pointer, it might have been changed by
 	 * a concurrent perf_event_context_sched_out().
 	 */
 	task = ctx->task;
-	if (task == TASK_TOMBSTONE) {
-		perf_ctx_unlock(cpuctx, ctx);
-		return;
-	}
+	if (task == TASK_TOMBSTONE)
+		goto unlock;
 	if (ctx->is_active) {
 		perf_ctx_unlock(cpuctx, ctx);
+		local_irq_enable();
 		goto again;
 	}
 	func(event, NULL, ctx, data);
+unlock:
 	perf_ctx_unlock(cpuctx, ctx);
+	local_irq_enable();
 }
 
 /*
Re: [tip: perf/core] perf: Fix event_function_call() locking
Posted by Pengfei Xu 1 year, 5 months ago
Hi Peter and Kim,

I tested this patch on top of 6.11.0-rc3-next-20240812.
This issue can not be reproduced in syzkaller reproducer.

Best Regards,
Thanks!

On 2024-08-13 at 23:02:09 +0200, Peter Zijlstra wrote:
> On Tue, Aug 13, 2024 at 11:28:54AM -0700, Namhyung Kim wrote:
> 
> Duh, yeah.
> 
> > ---
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 9893ba5e98aa..85204c2376fa 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -298,13 +298,14 @@ static int event_function(void *info)
> >  static void event_function_call(struct perf_event *event, event_f func, void *data)
> >  {
> >  	struct perf_event_context *ctx = event->ctx;
> > -	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> > +	struct perf_cpu_context *cpuctx;
> >  	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
> >  	struct event_function_struct efs = {
> >  		.event = event,
> >  		.func = func,
> >  		.data = data,
> >  	};
> > +	unsigned long flags;
> >  
> >  	if (!event->parent) {
> >  		/*
> > @@ -327,22 +328,27 @@ static void event_function_call(struct perf_event *event, event_f func, void *da
> >  	if (!task_function_call(task, event_function, &efs))
> >  		return;
> >  
> > +	local_irq_save(flags);
> 
> This can just be local_irq_disable() though, seeing how the fingered
> commit replaced raw_spin_lock_irq().
> 
> I'll queue the below...
> 
> ---
> Subject: perf: Really fix event_function_call() locking
> From: Namhyung Kim <namhyung@kernel.org>
> Date: Tue Aug 13 22:55:11 CEST 2024
> 
> Commit 558abc7e3f89 ("perf: Fix event_function_call() locking") lost
> IRQ disabling by mistake.
> 
> Fixes: 558abc7e3f89 ("perf: Fix event_function_call() locking")
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |   13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -298,8 +298,8 @@ static int event_function(void *info)
>  static void event_function_call(struct perf_event *event, event_f func, void *data)
>  {
>  	struct perf_event_context *ctx = event->ctx;
> -	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
>  	struct task_struct *task = READ_ONCE(ctx->task); /* verified in event_function */
> +	struct perf_cpu_context *cpuctx;
>  	struct event_function_struct efs = {
>  		.event = event,
>  		.func = func,
> @@ -327,22 +327,25 @@ static void event_function_call(struct p
>  	if (!task_function_call(task, event_function, &efs))
>  		return;
>  
> +	local_irq_disable();
> +	cpuctx = this_cpu_ptr(&perf_cpu_context);
>  	perf_ctx_lock(cpuctx, ctx);
>  	/*
>  	 * Reload the task pointer, it might have been changed by
>  	 * a concurrent perf_event_context_sched_out().
>  	 */
>  	task = ctx->task;
> -	if (task == TASK_TOMBSTONE) {
> -		perf_ctx_unlock(cpuctx, ctx);
> -		return;
> -	}
> +	if (task == TASK_TOMBSTONE)
> +		goto unlock;
>  	if (ctx->is_active) {
>  		perf_ctx_unlock(cpuctx, ctx);
> +		local_irq_enable();
>  		goto again;
>  	}
>  	func(event, NULL, ctx, data);
> +unlock:
>  	perf_ctx_unlock(cpuctx, ctx);
> +	local_irq_enable();
>  }
>  
>  /*