[PATCH] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference

chenyuan_fl@163.com posted 1 patch 2 days, 18 hours ago
There is a newer version of this series
kernel/trace/trace_event_perf.c | 5 +++++
1 file changed, 5 insertions(+)
[PATCH] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by chenyuan_fl@163.com 2 days, 18 hours ago
From: Yuan CHen <chenyuan@kylinos.cn>

There is a critical race condition in kprobe initialization that can lead to
NULL pointer dereference and kernel crash.

[1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
...
[1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
[1135630.269239] pc : kprobe_perf_func+0x30/0x260
[1135630.277643] lr : kprobe_dispatcher+0x44/0x60
[1135630.286041] sp : ffffaeff4977fa40
[1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
[1135630.302837] x27: 0000000000000000 x26: 0000000000000000
[1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
[1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
[1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
[1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
[1135630.349985] x17: 0000000000000000 x16: 0000000000000000
[1135630.359285] x15: 0000000000000000 x14: 0000000000000000
[1135630.368445] x13: 0000000000000000 x12: 0000000000000000
[1135630.377473] x11: 0000000000000000 x10: 0000000000000000
[1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
[1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
[1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
[1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
[1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
[1135630.429410] Call trace:
[1135630.434828]  kprobe_perf_func+0x30/0x260
[1135630.441661]  kprobe_dispatcher+0x44/0x60
[1135630.448396]  aggr_pre_handler+0x70/0xc8
[1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
[1135630.462435]  brk_handler+0xbc/0xd8
[1135630.468437]  do_debug_exception+0x84/0x138
[1135630.475074]  el1_dbg+0x18/0x8c
[1135630.480582]  security_file_permission+0x0/0xd0
[1135630.487426]  vfs_write+0x70/0x1c0
[1135630.493059]  ksys_write+0x5c/0xc8
[1135630.498638]  __arm64_sys_write+0x24/0x30
[1135630.504821]  el0_svc_common+0x78/0x130
[1135630.510838]  el0_svc_handler+0x38/0x78
[1135630.516834]  el0_svc+0x8/0x1b0

kernel/trace/trace_kprobe.c: 1308
0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
include/linux/compiler.h: 294
0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]

kernel/trace/trace_kprobe.c
1308: head = this_cpu_ptr(call->perf_events);
1309: 	if (hlist_empty(head))
1310:		return 0;

crash> struct trace_event_call -o
struct trace_event_call {
  ...
  [120] struct hlist_head *perf_events;  //(call->perf_event)
  ...
}

crash> struct trace_event_call ffffaf015340e528
struct trace_event_call {
  ...
  perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
  ...
}

Race Condition Analysis:

The race occurs between kprobe activation and perf_events initialization:

    CPU0                                    CPU1
    ====                                    ====
    perf_kprobe_init
      create_local_trace_kprobe
        alloc_trace_kprobe
          __register_trace_kprobe(tk);
            arm_kprobe(p);(1)← KPROBE ACTIVE
                                            Debug exception triggers
                                            ...
                                            kprobe_handler(regs)
      perf_trace_event_init                 ...
        tp_event->perf_events = list;(2)
        tp_event->class->reg (3)
                                            kprobe_dispatcher (tk->tp.flags & TP_FLAG_PROFILE)
                                              kprobe_perf_func
					        head = this_cpu_ptr(call->perf_events)(4)
                                                (perf_events is still NULL)

Critical Issue:

The race window is:
1. CPU0 calls `arm_kprobe(p)` at point (1) - kprobe becomes active
2. CPU1 triggers debug exception and reaches `kprobe_dispatcher`
3. CPU0 sets `tp_event->perf_events = list` at point (2) but this assignment
   may not be visible to CPU1 due to memory ordering
4. CPU0 calls `class->reg()` at point (3) setting `TP_FLAG_PROFILE`
5. CPU1 checks `(tk->tp.flags & TP_FLAG_PROFILE)` - condition passes
6. CPU1 calls `kprobe_perf_func()` and crashes at point (4)

The kprobe becomes active immediately at point (1), allowing concurrent CPUs
to enter kprobe handlers. However, `perf_events` assignment at point (2) happens
at a different time and may not be visible to other CPUs due to missing memory
barriers before point (3) sets `TP_FLAG_PROFILE`.

Signed-off-by: Yuan CHen <chenyuan@kylinos.cn>
---
 kernel/trace/trace_event_perf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a6bb7577e8c5..6eff8c9d6bae 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
 
 	tp_event->perf_events = list;
 
+	/* Ensure perf_events assignment is visible to all CPUs before enabling
+	 * profile functionality
+	 */
+	smp_mb();
+
 	if (!total_ref_count) {
 		char __percpu *buf;
 		int i;
-- 
2.39.5

Re: [PATCH] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Masami Hiramatsu (Google) 2 days, 15 hours ago
On Mon, 29 Sep 2025 04:11:22 +0100
chenyuan_fl@163.com wrote:

> From: Yuan CHen <chenyuan@kylinos.cn>
> 
> There is a critical race condition in kprobe initialization that can lead to
> NULL pointer dereference and kernel crash.

Good catch!

> 
> [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
> ...
> [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
> [1135630.269239] pc : kprobe_perf_func+0x30/0x260
> [1135630.277643] lr : kprobe_dispatcher+0x44/0x60
> [1135630.286041] sp : ffffaeff4977fa40
> [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
> [1135630.302837] x27: 0000000000000000 x26: 0000000000000000
> [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
> [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
> [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
> [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
> [1135630.349985] x17: 0000000000000000 x16: 0000000000000000
> [1135630.359285] x15: 0000000000000000 x14: 0000000000000000
> [1135630.368445] x13: 0000000000000000 x12: 0000000000000000
> [1135630.377473] x11: 0000000000000000 x10: 0000000000000000
> [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
> [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
> [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
> [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
> [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
> [1135630.429410] Call trace:
> [1135630.434828]  kprobe_perf_func+0x30/0x260
> [1135630.441661]  kprobe_dispatcher+0x44/0x60
> [1135630.448396]  aggr_pre_handler+0x70/0xc8
> [1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
> [1135630.462435]  brk_handler+0xbc/0xd8
> [1135630.468437]  do_debug_exception+0x84/0x138
> [1135630.475074]  el1_dbg+0x18/0x8c
> [1135630.480582]  security_file_permission+0x0/0xd0
> [1135630.487426]  vfs_write+0x70/0x1c0
> [1135630.493059]  ksys_write+0x5c/0xc8
> [1135630.498638]  __arm64_sys_write+0x24/0x30
> [1135630.504821]  el0_svc_common+0x78/0x130
> [1135630.510838]  el0_svc_handler+0x38/0x78
> [1135630.516834]  el0_svc+0x8/0x1b0
> 
> kernel/trace/trace_kprobe.c: 1308
> 0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
> include/linux/compiler.h: 294
> 0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]
> 
> kernel/trace/trace_kprobe.c
> 1308: head = this_cpu_ptr(call->perf_events);
> 1309: 	if (hlist_empty(head))
> 1310:		return 0;
> 
> crash> struct trace_event_call -o
> struct trace_event_call {
>   ...
>   [120] struct hlist_head *perf_events;  //(call->perf_event)
>   ...
> }
> 
> crash> struct trace_event_call ffffaf015340e528
> struct trace_event_call {
>   ...
>   perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
>   ...
> }
> 
> Race Condition Analysis:
> 
> The race occurs between kprobe activation and perf_events initialization:
> 
>     CPU0                                    CPU1
>     ====                                    ====
>     perf_kprobe_init
>       create_local_trace_kprobe
>         alloc_trace_kprobe
>           __register_trace_kprobe(tk);
>             arm_kprobe(p);(1)← KPROBE ACTIVE

This is strange, alloc_trace_kprobe() does not register kprobes,
but just allocate a trace_kprobe data structure.
Also, __register_trace_kprobe() should not activate the kprobe
event because it is not enabled.

To enable trace_kprobe, you should call kprobe_register()
via call->class->reg. That is (3).


>                                             Debug exception triggers
>                                             ...
>                                             kprobe_handler(regs)
>       perf_trace_event_init                 ...
>         tp_event->perf_events = list;(2)
>         tp_event->class->reg (3)

So after this point, kprobe is armed (swbp is installed).

Anyway, the point is
 - perf_trace_event_init() sets `tp_event->perf_events` and
   install kprobes on CPU0. But kprobe_dispatcher() on CPU1
   sees the call->perf_events == NULL. This is because the
   memory update is not shown in CPU1.

So I think code is OK, but the explanation is not correct.
We need a memory barrier which ensures that the `perf_events`
is shown on other CPUs. This may happen on the machine with
weak memory model.

Can you update the analysis part?

Thank you,


>                                             kprobe_dispatcher (tk->tp.flags & TP_FLAG_PROFILE)
>                                               kprobe_perf_func
> 					        head = this_cpu_ptr(call->perf_events)(4)
>                                                 (perf_events is still NULL)
> 
> Critical Issue:
> 
> The race window is:
> 1. CPU0 calls `arm_kprobe(p)` at point (1) - kprobe becomes active
> 2. CPU1 triggers debug exception and reaches `kprobe_dispatcher`
> 3. CPU0 sets `tp_event->perf_events = list` at point (2) but this assignment
>    may not be visible to CPU1 due to memory ordering
> 4. CPU0 calls `class->reg()` at point (3) setting `TP_FLAG_PROFILE`
> 5. CPU1 checks `(tk->tp.flags & TP_FLAG_PROFILE)` - condition passes
> 6. CPU1 calls `kprobe_perf_func()` and crashes at point (4)
> 
> The kprobe becomes active immediately at point (1), allowing concurrent CPUs
> to enter kprobe handlers. However, `perf_events` assignment at point (2) happens
> at a different time and may not be visible to other CPUs due to missing memory
> barriers before point (3) sets `TP_FLAG_PROFILE`.
> 
> Signed-off-by: Yuan CHen <chenyuan@kylinos.cn>
> ---
>  kernel/trace/trace_event_perf.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..6eff8c9d6bae 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
>  
>  	tp_event->perf_events = list;
>  
> +	/* Ensure perf_events assignment is visible to all CPUs before enabling
> +	 * profile functionality
> +	 */
> +	smp_mb();
> +
>  	if (!total_ref_count) {
>  		char __percpu *buf;
>  		int i;
> -- 
> 2.39.5
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
[PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by chenyuan_fl@163.com 2 days, 14 hours ago
From: Yuan Chen <chenyuan@kylinos.cn>

There is a critical race condition in kprobe initialization that can lead to
NULL pointer dereference and kernel crash.

[1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
...
[1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
[1135630.269239] pc : kprobe_perf_func+0x30/0x260
[1135630.277643] lr : kprobe_dispatcher+0x44/0x60
[1135630.286041] sp : ffffaeff4977fa40
[1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
[1135630.302837] x27: 0000000000000000 x26: 0000000000000000
[1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
[1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
[1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
[1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
[1135630.349985] x17: 0000000000000000 x16: 0000000000000000
[1135630.359285] x15: 0000000000000000 x14: 0000000000000000
[1135630.368445] x13: 0000000000000000 x12: 0000000000000000
[1135630.377473] x11: 0000000000000000 x10: 0000000000000000
[1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
[1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
[1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
[1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
[1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
[1135630.429410] Call trace:
[1135630.434828]  kprobe_perf_func+0x30/0x260
[1135630.441661]  kprobe_dispatcher+0x44/0x60
[1135630.448396]  aggr_pre_handler+0x70/0xc8
[1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
[1135630.462435]  brk_handler+0xbc/0xd8
[1135630.468437]  do_debug_exception+0x84/0x138
[1135630.475074]  el1_dbg+0x18/0x8c
[1135630.480582]  security_file_permission+0x0/0xd0
[1135630.487426]  vfs_write+0x70/0x1c0
[1135630.493059]  ksys_write+0x5c/0xc8
[1135630.498638]  __arm64_sys_write+0x24/0x30
[1135630.504821]  el0_svc_common+0x78/0x130
[1135630.510838]  el0_svc_handler+0x38/0x78
[1135630.516834]  el0_svc+0x8/0x1b0

kernel/trace/trace_kprobe.c: 1308
0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
include/linux/compiler.h: 294
0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]

kernel/trace/trace_kprobe.c
1308: head = this_cpu_ptr(call->perf_events);
1309:   if (hlist_empty(head))
1310:           return 0;

crash> struct trace_event_call -o
struct trace_event_call {
  ...
  [120] struct hlist_head *perf_events;  //(call->perf_event)
  ...
}

crash> struct trace_event_call ffffaf015340e528
struct trace_event_call {
  ...
  perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
  ...
}

Race Condition Analysis:

The race occurs between kprobe activation and perf_events initialization:

  CPU0                                    CPU1
  ====                                    ====
  perf_kprobe_init
    perf_trace_event_init
      tp_event->perf_events = list;(1)
      tp_event->class->reg (2)← KPROBE ACTIVE
                                          Debug exception triggers
                                          ...
                                          kprobe_dispatcher
                                            kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
                                              head = this_cpu_ptr(call->perf_events)(3)
                                              (perf_events is still NULL)

Problem:
1. CPU0 executes (1) assigning tp_event->perf_events = list
2. CPU0 executes (2) enabling kprobe functionality via class->reg()
3. CPU1 triggers and reaches kprobe_dispatcher
4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
5. CPU1 calls kprobe_perf_func() and crashes at (3) because
   call->perf_events is still NULL

The issue: Assignment in step 1 may not be visible to CPU1 due to
missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.

Add smp_mb() barrier between perf_events assignment and enabling
profile functionality to ensure visibility ordering across CPUs.

v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().

Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
---
 kernel/trace/trace_event_perf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a6bb7577e8c5..6eff8c9d6bae 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
 
 	tp_event->perf_events = list;
 
+	/* Ensure perf_events assignment is visible to all CPUs before enabling
+	 * profile functionality
+	 */
+	smp_mb();
+
 	if (!total_ref_count) {
 		char __percpu *buf;
 		int i;
-- 
2.39.5

Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Steven Rostedt 2 days, 12 hours ago
On Mon, 29 Sep 2025 07:57:31 +0100
chenyuan_fl@163.com wrote:

> From: Yuan Chen <chenyuan@kylinos.cn>
> 
> There is a critical race condition in kprobe initialization that can lead to
> NULL pointer dereference and kernel crash.
> 
> [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
> ...
> [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
> [1135630.269239] pc : kprobe_perf_func+0x30/0x260
> [1135630.277643] lr : kprobe_dispatcher+0x44/0x60
> [1135630.286041] sp : ffffaeff4977fa40
> [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
> [1135630.302837] x27: 0000000000000000 x26: 0000000000000000
> [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
> [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
> [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
> [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
> [1135630.349985] x17: 0000000000000000 x16: 0000000000000000
> [1135630.359285] x15: 0000000000000000 x14: 0000000000000000
> [1135630.368445] x13: 0000000000000000 x12: 0000000000000000
> [1135630.377473] x11: 0000000000000000 x10: 0000000000000000
> [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
> [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
> [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
> [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
> [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
> [1135630.429410] Call trace:
> [1135630.434828]  kprobe_perf_func+0x30/0x260
> [1135630.441661]  kprobe_dispatcher+0x44/0x60
> [1135630.448396]  aggr_pre_handler+0x70/0xc8
> [1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
> [1135630.462435]  brk_handler+0xbc/0xd8
> [1135630.468437]  do_debug_exception+0x84/0x138
> [1135630.475074]  el1_dbg+0x18/0x8c
> [1135630.480582]  security_file_permission+0x0/0xd0
> [1135630.487426]  vfs_write+0x70/0x1c0
> [1135630.493059]  ksys_write+0x5c/0xc8
> [1135630.498638]  __arm64_sys_write+0x24/0x30
> [1135630.504821]  el0_svc_common+0x78/0x130
> [1135630.510838]  el0_svc_handler+0x38/0x78
> [1135630.516834]  el0_svc+0x8/0x1b0
> 
> kernel/trace/trace_kprobe.c: 1308
> 0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
> include/linux/compiler.h: 294
> 0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]
> 
> kernel/trace/trace_kprobe.c
> 1308: head = this_cpu_ptr(call->perf_events);
> 1309:   if (hlist_empty(head))
> 1310:           return 0;
> 
> crash> struct trace_event_call -o  
> struct trace_event_call {
>   ...
>   [120] struct hlist_head *perf_events;  //(call->perf_event)
>   ...
> }
> 
> crash> struct trace_event_call ffffaf015340e528  
> struct trace_event_call {
>   ...
>   perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
>   ...
> }
> 
> Race Condition Analysis:
> 
> The race occurs between kprobe activation and perf_events initialization:
> 
>   CPU0                                    CPU1
>   ====                                    ====
>   perf_kprobe_init
>     perf_trace_event_init
>       tp_event->perf_events = list;(1)
>       tp_event->class->reg (2)← KPROBE ACTIVE
>                                           Debug exception triggers
>                                           ...
>                                           kprobe_dispatcher
>                                             kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
>                                               head = this_cpu_ptr(call->perf_events)(3)
>                                               (perf_events is still NULL)
> 
> Problem:
> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> 3. CPU1 triggers and reaches kprobe_dispatcher
> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
>    call->perf_events is still NULL
> 
> The issue: Assignment in step 1 may not be visible to CPU1 due to
> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> 
> Add smp_mb() barrier between perf_events assignment and enabling
> profile functionality to ensure visibility ordering across CPUs.
> 
> v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().
> 
> Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
> ---
>  kernel/trace/trace_event_perf.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..6eff8c9d6bae 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
>  
>  	tp_event->perf_events = list;
>  
> +	/* Ensure perf_events assignment is visible to all CPUs before enabling
> +	 * profile functionality
> +	 */
> +	smp_mb();

So from other discussions I had with John and Sebastian (both Cc'd),
memory barriers are not for "making memory visible", but instead are
for interactions between memory and two different CPUs, where both CPUs
have memory barriers.

John or Sebastian, thoughts?

-- Steve


> +
>  	if (!total_ref_count) {
>  		char __percpu *buf;
>  		int i;

	
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by John Ogness 2 days, 12 hours ago
On 2025-09-29, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 29 Sep 2025 07:57:31 +0100
> chenyuan_fl@163.com wrote:
>
>> From: Yuan Chen <chenyuan@kylinos.cn>
>> 
>> There is a critical race condition in kprobe initialization that can lead to
>> NULL pointer dereference and kernel crash.
>> 
>> [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
>> ...
>> [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
>> [1135630.269239] pc : kprobe_perf_func+0x30/0x260
>> [1135630.277643] lr : kprobe_dispatcher+0x44/0x60
>> [1135630.286041] sp : ffffaeff4977fa40
>> [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
>> [1135630.302837] x27: 0000000000000000 x26: 0000000000000000
>> [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
>> [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
>> [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
>> [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
>> [1135630.349985] x17: 0000000000000000 x16: 0000000000000000
>> [1135630.359285] x15: 0000000000000000 x14: 0000000000000000
>> [1135630.368445] x13: 0000000000000000 x12: 0000000000000000
>> [1135630.377473] x11: 0000000000000000 x10: 0000000000000000
>> [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
>> [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
>> [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
>> [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
>> [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
>> [1135630.429410] Call trace:
>> [1135630.434828]  kprobe_perf_func+0x30/0x260
>> [1135630.441661]  kprobe_dispatcher+0x44/0x60
>> [1135630.448396]  aggr_pre_handler+0x70/0xc8
>> [1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
>> [1135630.462435]  brk_handler+0xbc/0xd8
>> [1135630.468437]  do_debug_exception+0x84/0x138
>> [1135630.475074]  el1_dbg+0x18/0x8c
>> [1135630.480582]  security_file_permission+0x0/0xd0
>> [1135630.487426]  vfs_write+0x70/0x1c0
>> [1135630.493059]  ksys_write+0x5c/0xc8
>> [1135630.498638]  __arm64_sys_write+0x24/0x30
>> [1135630.504821]  el0_svc_common+0x78/0x130
>> [1135630.510838]  el0_svc_handler+0x38/0x78
>> [1135630.516834]  el0_svc+0x8/0x1b0
>> 
>> kernel/trace/trace_kprobe.c: 1308
>> 0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
>> include/linux/compiler.h: 294
>> 0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]
>> 
>> kernel/trace/trace_kprobe.c
>> 1308: head = this_cpu_ptr(call->perf_events);
>> 1309:   if (hlist_empty(head))
>> 1310:           return 0;
>> 
>> crash> struct trace_event_call -o  
>> struct trace_event_call {
>>   ...
>>   [120] struct hlist_head *perf_events;  //(call->perf_event)
>>   ...
>> }
>> 
>> crash> struct trace_event_call ffffaf015340e528  
>> struct trace_event_call {
>>   ...
>>   perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
>>   ...
>> }
>> 
>> Race Condition Analysis:
>> 
>> The race occurs between kprobe activation and perf_events initialization:
>> 
>>   CPU0                                    CPU1
>>   ====                                    ====
>>   perf_kprobe_init
>>     perf_trace_event_init
>>       tp_event->perf_events = list;(1)
>>       tp_event->class->reg (2)← KPROBE ACTIVE
>>                                           Debug exception triggers
>>                                           ...
>>                                           kprobe_dispatcher
>>                                             kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
>>                                               head = this_cpu_ptr(call->perf_events)(3)
>>                                               (perf_events is still NULL)

I do not know anything about the kprobe and perf internals. This email
should hopefully help to act as a guide of where you need to place the
memory barrier _pair_. If I understand the problem description
correctly, you would need:

>> Problem:
>> 1. CPU0 executes (1) assigning tp_event->perf_events = list

smp_wmb()

>> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
>> 3. CPU1 triggers and reaches kprobe_dispatcher
>> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)

smp_rmb()

>> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
>>    call->perf_events is still NULL
>> 
>> The issue: Assignment in step 1 may not be visible to CPU1 due to
>> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.

A better explanation of the issue would be: CPU1 sees that kprobe
functionality is enabled but does not see that perf_events has been
assigned.

Add pairing read and write memory barriers to guarantee that if CPU1
sees that kprobe functionality is enabled, it must also see that
perf_events has been assigned.

Note that this could also be done more efficiently using a store_release
when setting the flag (in step 2) and a load_acquire when loading the
flag (in step 4).

John Ogness
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Peter Zijlstra 2 days, 11 hours ago
On Mon, Sep 29, 2025 at 11:38:08AM +0206, John Ogness wrote:

> >> Problem:
> >> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> 
> smp_wmb()
> 
> >> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> >> 3. CPU1 triggers and reaches kprobe_dispatcher
> >> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> 
> smp_rmb()
> 
> >> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> >>    call->perf_events is still NULL
> >> 
> >> The issue: Assignment in step 1 may not be visible to CPU1 due to
> >> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> 
> A better explanation of the issue would be: CPU1 sees that kprobe
> functionality is enabled but does not see that perf_events has been
> assigned.
> 
> Add pairing read and write memory barriers to guarantee that if CPU1
> sees that kprobe functionality is enabled, it must also see that
> perf_events has been assigned.
> 
> Note that this could also be done more efficiently using a store_release
> when setting the flag (in step 2) and a load_acquire when loading the
> flag (in step 4).

The RELEASE+ACQUIRE is a better pattern for these cases. 

And I'll argue the barrier should be in 2 not 1, since it is 2 that sets
the flag checked in 4.  Any store before that flag might be affected,
not just the ->perf_events list.
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Masami Hiramatsu (Google) 1 day, 12 hours ago
On Mon, 29 Sep 2025 12:12:59 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Sep 29, 2025 at 11:38:08AM +0206, John Ogness wrote:
> 
> > >> Problem:
> > >> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> > 
> > smp_wmb()
> > 
> > >> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> > >> 3. CPU1 triggers and reaches kprobe_dispatcher
> > >> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> > 
> > smp_rmb()
> > 
> > >> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> > >>    call->perf_events is still NULL
> > >> 
> > >> The issue: Assignment in step 1 may not be visible to CPU1 due to
> > >> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> > 
> > A better explanation of the issue would be: CPU1 sees that kprobe
> > functionality is enabled but does not see that perf_events has been
> > assigned.
> > 
> > Add pairing read and write memory barriers to guarantee that if CPU1
> > sees that kprobe functionality is enabled, it must also see that
> > perf_events has been assigned.
> > 
> > Note that this could also be done more efficiently using a store_release
> > when setting the flag (in step 2) and a load_acquire when loading the
> > flag (in step 4).
> 
> The RELEASE+ACQUIRE is a better pattern for these cases. 
> 
> And I'll argue the barrier should be in 2 not 1, since it is 2 that sets
> the flag checked in 4.  Any store before that flag might be affected,
> not just the ->perf_events list.

RELEASE+ACQUIRE ensures the memory ordering on the `same` CPU, so do we still need smp_rmb() and smp_wmb()? e.g.

perf_trace_event_init()
-----
tp_event->perf_events = list
/*(1)*/
smp_store_release(&tp->event->flags, newflag);
-----

kprobe_dispatcher()
-----
smp_load_acquire(&tk->tp.flags);
/*(2)*/
this_cpu_ptr(call->perf_events);
-----

This ensures
 - the flags update is shown on other CPUs
 - perf_events update is done before the flags update on the same CPU
 - perf_events load is done after the flags load on the same CPU

But we are still not sure the perf_events update is shown on other CPUs?

To ensure that we still need smp_wmb() and smp_rmb() at (1) and (2).

Or we don't need smp_*mb()?

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Peter Zijlstra 1 day, 11 hours ago
On Tue, Sep 30, 2025 at 05:58:26PM +0900, Masami Hiramatsu wrote:
> On Mon, 29 Sep 2025 12:12:59 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Mon, Sep 29, 2025 at 11:38:08AM +0206, John Ogness wrote:
> > 
> > > >> Problem:
> > > >> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> > > 
> > > smp_wmb()
> > > 
> > > >> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> > > >> 3. CPU1 triggers and reaches kprobe_dispatcher
> > > >> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> > > 
> > > smp_rmb()
> > > 
> > > >> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> > > >>    call->perf_events is still NULL
> > > >> 
> > > >> The issue: Assignment in step 1 may not be visible to CPU1 due to
> > > >> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> > > 
> > > A better explanation of the issue would be: CPU1 sees that kprobe
> > > functionality is enabled but does not see that perf_events has been
> > > assigned.
> > > 
> > > Add pairing read and write memory barriers to guarantee that if CPU1
> > > sees that kprobe functionality is enabled, it must also see that
> > > perf_events has been assigned.
> > > 
> > > Note that this could also be done more efficiently using a store_release
> > > when setting the flag (in step 2) and a load_acquire when loading the
> > > flag (in step 4).
> > 
> > The RELEASE+ACQUIRE is a better pattern for these cases. 
> > 
> > And I'll argue the barrier should be in 2 not 1, since it is 2 that sets
> > the flag checked in 4.  Any store before that flag might be affected,
> > not just the ->perf_events list.
> 
> RELEASE+ACQUIRE ensures the memory ordering on the `same` CPU, so do we still need smp_rmb() and smp_wmb()? e.g.

Eh, no, that's wrong. RELEASE and ACQUIRE are SMP barriers.
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Masami Hiramatsu (Google) 1 day, 5 hours ago
On Tue, 30 Sep 2025 12:10:52 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Sep 30, 2025 at 05:58:26PM +0900, Masami Hiramatsu wrote:
> > On Mon, 29 Sep 2025 12:12:59 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Mon, Sep 29, 2025 at 11:38:08AM +0206, John Ogness wrote:
> > > 
> > > > >> Problem:
> > > > >> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> > > > 
> > > > smp_wmb()
> > > > 
> > > > >> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> > > > >> 3. CPU1 triggers and reaches kprobe_dispatcher
> > > > >> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> > > > 
> > > > smp_rmb()
> > > > 
> > > > >> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> > > > >>    call->perf_events is still NULL
> > > > >> 
> > > > >> The issue: Assignment in step 1 may not be visible to CPU1 due to
> > > > >> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> > > > 
> > > > A better explanation of the issue would be: CPU1 sees that kprobe
> > > > functionality is enabled but does not see that perf_events has been
> > > > assigned.
> > > > 
> > > > Add pairing read and write memory barriers to guarantee that if CPU1
> > > > sees that kprobe functionality is enabled, it must also see that
> > > > perf_events has been assigned.
> > > > 
> > > > Note that this could also be done more efficiently using a store_release
> > > > when setting the flag (in step 2) and a load_acquire when loading the
> > > > flag (in step 4).
> > > 
> > > The RELEASE+ACQUIRE is a better pattern for these cases. 
> > > 
> > > And I'll argue the barrier should be in 2 not 1, since it is 2 that sets
> > > the flag checked in 4.  Any store before that flag might be affected,
> > > not just the ->perf_events list.
> > 
> > RELEASE+ACQUIRE ensures the memory ordering on the `same` CPU, so do we still need smp_rmb() and smp_wmb()? e.g.
> 
> Eh, no, that's wrong. RELEASE and ACQUIRE are SMP barriers.

OK, thanks for confirmation!


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
[PATH v3] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by chenyuan_fl@163.com 1 day, 13 hours ago
From: Yuan Chen <chenyuan@kylinos.cn>

There is a critical race condition in kprobe initialization that can lead to
NULL pointer dereference and kernel crash.

[1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
...
[1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
[1135630.269239] pc : kprobe_perf_func+0x30/0x260
[1135630.277643] lr : kprobe_dispatcher+0x44/0x60
[1135630.286041] sp : ffffaeff4977fa40
[1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
[1135630.302837] x27: 0000000000000000 x26: 0000000000000000
[1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
[1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
[1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
[1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
[1135630.349985] x17: 0000000000000000 x16: 0000000000000000
[1135630.359285] x15: 0000000000000000 x14: 0000000000000000
[1135630.368445] x13: 0000000000000000 x12: 0000000000000000
[1135630.377473] x11: 0000000000000000 x10: 0000000000000000
[1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
[1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
[1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
[1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
[1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
[1135630.429410] Call trace:
[1135630.434828]  kprobe_perf_func+0x30/0x260
[1135630.441661]  kprobe_dispatcher+0x44/0x60
[1135630.448396]  aggr_pre_handler+0x70/0xc8
[1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
[1135630.462435]  brk_handler+0xbc/0xd8
[1135630.468437]  do_debug_exception+0x84/0x138
[1135630.475074]  el1_dbg+0x18/0x8c
[1135630.480582]  security_file_permission+0x0/0xd0
[1135630.487426]  vfs_write+0x70/0x1c0
[1135630.493059]  ksys_write+0x5c/0xc8
[1135630.498638]  __arm64_sys_write+0x24/0x30
[1135630.504821]  el0_svc_common+0x78/0x130
[1135630.510838]  el0_svc_handler+0x38/0x78
[1135630.516834]  el0_svc+0x8/0x1b0

kernel/trace/trace_kprobe.c: 1308
0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
include/linux/compiler.h: 294
0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]

kernel/trace/trace_kprobe.c
1308: head = this_cpu_ptr(call->perf_events);
1309: if (hlist_empty(head))
1310: 	return 0;

crash> struct trace_event_call -o
struct trace_event_call {
  ...
  [120] struct hlist_head *perf_events;  //(call->perf_event)
  ...
}

crash> struct trace_event_call ffffaf015340e528
struct trace_event_call {
  ...
  perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
  ...
}

Race Condition Analysis:

The race occurs between kprobe activation and perf_events initialization:

  CPU0                                    CPU1
  ====                                    ====
  perf_kprobe_init
    perf_trace_event_init
      tp_event->perf_events = list;(1)
      tp_event->class->reg (2)← KPROBE ACTIVE
                                          Debug exception triggers
                                          ...
                                          kprobe_dispatcher
                                            kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
                                              head = this_cpu_ptr(call->perf_events)(3)
                                              (perf_events is still NULL)

Problem:
1. CPU0 executes (1) assigning tp_event->perf_events = list
2. CPU0 executes (2) enabling kprobe functionality via class->reg()
3. CPU1 triggers and reaches kprobe_dispatcher
4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
5. CPU1 calls kprobe_perf_func() and crashes at (3) because
   call->perf_events is still NULL

CPU1 sees that kprobe functionality is enabled but does not see that
perf_events has been assigned.

Add pairing read and write memory barriers to guarantee that if CPU1
sees that kprobe functionality is enabled, it must also see that
perf_events has been assigned.

v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().
v2->v3: Adopt RELEASE/ACQUIRE semantics per Peter/John's suggestions,
        aligning with Steven's clarification on barrier purposes.

Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
---
 kernel/trace/trace_probe.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 842383fbc03b..98b838591edc 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -274,19 +274,19 @@ struct event_file_link {
 static inline bool trace_probe_test_flag(struct trace_probe *tp,
 					 unsigned int flag)
 {
-	return !!(tp->event->flags & flag);
+	return !!(smp_load_acquire(&tp->event->flags) & flag);
 }
 
 static inline void trace_probe_set_flag(struct trace_probe *tp,
 					unsigned int flag)
 {
-	tp->event->flags |= flag;
+	smp_store_release(&tp->event->flags, tp->event->flags | flag);
 }
 
 static inline void trace_probe_clear_flag(struct trace_probe *tp,
 					  unsigned int flag)
 {
-	tp->event->flags &= ~flag;
+	smp_store_release(&tp->event->flags, tp->event->flags & ~flag);
 }
 
 static inline bool trace_probe_is_enabled(struct trace_probe *tp)
-- 
2.39.5

Re: [PATH v3] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by John Ogness 1 day, 12 hours ago
On 2025-09-30, chenyuan_fl@163.com wrote:
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 842383fbc03b..98b838591edc 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -274,19 +274,19 @@ struct event_file_link {
>  static inline bool trace_probe_test_flag(struct trace_probe *tp,
>  					 unsigned int flag)
>  {
> -	return !!(tp->event->flags & flag);
> +	return !!(smp_load_acquire(&tp->event->flags) & flag);
>  }
>  
>  static inline void trace_probe_set_flag(struct trace_probe *tp,
>  					unsigned int flag)
>  {
> -	tp->event->flags |= flag;
> +	smp_store_release(&tp->event->flags, tp->event->flags | flag);
>  }
>  
>  static inline void trace_probe_clear_flag(struct trace_probe *tp,
>  					  unsigned int flag)
>  {
> -	tp->event->flags &= ~flag;
> +	smp_store_release(&tp->event->flags, tp->event->flags & ~flag);
>  }
>  
>  static inline bool trace_probe_is_enabled(struct trace_probe *tp)

I don't have any feedback about the correctness with regards to tracing
and kprobes. However, I recommend writing a comment at each necessary
memory barrier site. The comment should mention the pairing memory
barrier and the ordering it is guaranteeing.

John
Re: [PATH v3] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Peter Zijlstra 1 day, 12 hours ago
On Tue, Sep 30, 2025 at 09:18:48AM +0100, chenyuan_fl@163.com wrote:

> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 842383fbc03b..98b838591edc 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -274,19 +274,19 @@ struct event_file_link {
>  static inline bool trace_probe_test_flag(struct trace_probe *tp,
>  					 unsigned int flag)
>  {
> -	return !!(tp->event->flags & flag);
> +	return !!(smp_load_acquire(&tp->event->flags) & flag);
>  }
>  
>  static inline void trace_probe_set_flag(struct trace_probe *tp,
>  					unsigned int flag)
>  {
> -	tp->event->flags |= flag;
> +	smp_store_release(&tp->event->flags, tp->event->flags | flag);
>  }
>  
>  static inline void trace_probe_clear_flag(struct trace_probe *tp,
>  					  unsigned int flag)
>  {
> -	tp->event->flags &= ~flag;
> +	smp_store_release(&tp->event->flags, tp->event->flags & ~flag);
>  }


I _think_ the clear one is superfluous. Is there anything that cares
about stores done before the clear when the flag is found not set?

Also, code like:

static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
			     unsigned long ret_ip, struct ftrace_regs *fregs,
			     void *entry_data)
{
	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
	int ret = 0;

	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
		fentry_trace_func(tf, entry_ip, fregs);

#ifdef CONFIG_PERF_EVENTS
	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
		ret = fentry_perf_func(tf, entry_ip, fregs);
#endif
	return ret;
}


Will now have two barriers; where one would suffice, eg.

	flags = smp_load_acquire(&tp->event->flags);

	if (flags & TP_FLAG_TRACE)
		fentry_trace_func(...);

	if (flags & TP_FLAG_PROFILE)
		fentry_perf_func(...);

Should be just fine afaict.


Is this something anybody cares about?
Re: [PATH v3] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Masami Hiramatsu (Google) 1 day, 5 hours ago
On Tue, 30 Sep 2025 10:46:45 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Sep 30, 2025 at 09:18:48AM +0100, chenyuan_fl@163.com wrote:
> 
> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index 842383fbc03b..98b838591edc 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -274,19 +274,19 @@ struct event_file_link {
> >  static inline bool trace_probe_test_flag(struct trace_probe *tp,
> >  					 unsigned int flag)
> >  {
> > -	return !!(tp->event->flags & flag);
> > +	return !!(smp_load_acquire(&tp->event->flags) & flag);
> >  }
> >  
> >  static inline void trace_probe_set_flag(struct trace_probe *tp,
> >  					unsigned int flag)
> >  {
> > -	tp->event->flags |= flag;
> > +	smp_store_release(&tp->event->flags, tp->event->flags | flag);
> >  }
> >  
> >  static inline void trace_probe_clear_flag(struct trace_probe *tp,
> >  					  unsigned int flag)
> >  {
> > -	tp->event->flags &= ~flag;
> > +	smp_store_release(&tp->event->flags, tp->event->flags & ~flag);
> >  }
> 
> 
> I _think_ the clear one is superfluous. Is there anything that cares
> about stores done before the clear when the flag is found not set?
> 
> Also, code like:
> 
> static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
> 			     unsigned long ret_ip, struct ftrace_regs *fregs,
> 			     void *entry_data)
> {
> 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
> 	int ret = 0;
> 
> 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
> 		fentry_trace_func(tf, entry_ip, fregs);
> 
> #ifdef CONFIG_PERF_EVENTS
> 	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
> 		ret = fentry_perf_func(tf, entry_ip, fregs);
> #endif
> 	return ret;
> }
> 
> 
> Will now have two barriers; where one would suffice, eg.
> 
> 	flags = smp_load_acquire(&tp->event->flags);
> 
> 	if (flags & TP_FLAG_TRACE)
> 		fentry_trace_func(...);
> 
> 	if (flags & TP_FLAG_PROFILE)
> 		fentry_perf_func(...);
> 
> Should be just fine afaict.

Looks good to me. We should replace trace_probe_test_flag()
with trace_probe_load_flag().

Thanks,

> 
> 
> Is this something anybody cares about?


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
[PATCH v4] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by chenyuan_fl@163.com 19 hours ago
From: Yuan Chen <chenyuan@kylinos.cn>

There is a critical race condition in kprobe initialization that can lead to
NULL pointer dereference and kernel crash.

[1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
...
[1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
[1135630.269239] pc : kprobe_perf_func+0x30/0x260
[1135630.277643] lr : kprobe_dispatcher+0x44/0x60
[1135630.286041] sp : ffffaeff4977fa40
[1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
[1135630.302837] x27: 0000000000000000 x26: 0000000000000000
[1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
[1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
[1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
[1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
[1135630.349985] x17: 0000000000000000 x16: 0000000000000000
[1135630.359285] x15: 0000000000000000 x14: 0000000000000000
[1135630.368445] x13: 0000000000000000 x12: 0000000000000000
[1135630.377473] x11: 0000000000000000 x10: 0000000000000000
[1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
[1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
[1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
[1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
[1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
[1135630.429410] Call trace:
[1135630.434828]  kprobe_perf_func+0x30/0x260
[1135630.441661]  kprobe_dispatcher+0x44/0x60
[1135630.448396]  aggr_pre_handler+0x70/0xc8
[1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
[1135630.462435]  brk_handler+0xbc/0xd8
[1135630.468437]  do_debug_exception+0x84/0x138
[1135630.475074]  el1_dbg+0x18/0x8c
[1135630.480582]  security_file_permission+0x0/0xd0
[1135630.487426]  vfs_write+0x70/0x1c0
[1135630.493059]  ksys_write+0x5c/0xc8
[1135630.498638]  __arm64_sys_write+0x24/0x30
[1135630.504821]  el0_svc_common+0x78/0x130
[1135630.510838]  el0_svc_handler+0x38/0x78
[1135630.516834]  el0_svc+0x8/0x1b0

kernel/trace/trace_kprobe.c: 1308
0xffff3df8995039ec <kprobe_perf_func+0x2c>:     ldr     x21, [x24,#120]
include/linux/compiler.h: 294
0xffff3df8995039f0 <kprobe_perf_func+0x30>:     ldr     x1, [x21,x0]

kernel/trace/trace_kprobe.c
1308: head = this_cpu_ptr(call->perf_events);
1309: if (hlist_empty(head))
1310: 	return 0;

crash> struct trace_event_call -o
struct trace_event_call {
  ...
  [120] struct hlist_head *perf_events;  //(call->perf_event)
  ...
}

crash> struct trace_event_call ffffaf015340e528
struct trace_event_call {
  ...
  perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
  ...
}

Race Condition Analysis:

The race occurs between kprobe activation and perf_events initialization:

  CPU0                                    CPU1
  ====                                    ====
  perf_kprobe_init
    perf_trace_event_init
      tp_event->perf_events = list;(1)
      tp_event->class->reg (2)← KPROBE ACTIVE
                                          Debug exception triggers
                                          ...
                                          kprobe_dispatcher
                                            kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
                                              head = this_cpu_ptr(call->perf_events)(3)
                                              (perf_events is still NULL)

Problem:
1. CPU0 executes (1) assigning tp_event->perf_events = list
2. CPU0 executes (2) enabling kprobe functionality via class->reg()
3. CPU1 triggers and reaches kprobe_dispatcher
4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
5. CPU1 calls kprobe_perf_func() and crashes at (3) because
   call->perf_events is still NULL

CPU1 sees that kprobe functionality is enabled but does not see that
perf_events has been assigned.

Add pairing read and write memory barriers to guarantee that if CPU1
sees that kprobe functionality is enabled, it must also see that
perf_events has been assigned.

v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().
v2->v3: Adopt RELEASE/ACQUIRE semantics per Peter/John's suggestions,
        aligning with Steven's clarification on barrier purposes.
v3->v4: Introduce load_flag() (Masami) and optimize barrier usage in
        checks/clear (Peter).

Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
---
 kernel/trace/trace_fprobe.c | 10 ++++++----
 kernel/trace/trace_kprobe.c | 11 +++++++----
 kernel/trace/trace_probe.h  |  9 +++++++--
 kernel/trace/trace_uprobe.c | 12 ++++++++----
 4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index b36ade43d4b3..ad9d6347b5fa 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -522,13 +522,14 @@ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+	unsigned int flags = trace_probe_load_flag(&tf->tp);
 	int ret = 0;
 
-	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
+	if (flags & TP_FLAG_TRACE)
 		fentry_trace_func(tf, entry_ip, fregs);
 
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		ret = fentry_perf_func(tf, entry_ip, fregs);
 #endif
 	return ret;
@@ -540,11 +541,12 @@ static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
 			     void *entry_data)
 {
 	struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+	unsigned int flags = trace_probe_load_flag(&tf->tp);
 
-	if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
+	if (flags & TP_FLAG_TRACE)
 		fexit_trace_func(tf, entry_ip, ret_ip, fregs, entry_data);
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		fexit_perf_func(tf, entry_ip, ret_ip, fregs, entry_data);
 #endif
 }
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ccae62d4fb91..b1b793b8f191 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1813,14 +1813,15 @@ static int kprobe_register(struct trace_event_call *event,
 static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
 {
 	struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
+	unsigned int flags = trace_probe_load_flag(&tk->tp);
 	int ret = 0;
 
 	raw_cpu_inc(*tk->nhit);
 
-	if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
+	if (flags & TP_FLAG_TRACE)
 		kprobe_trace_func(tk, regs);
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		ret = kprobe_perf_func(tk, regs);
 #endif
 	return ret;
@@ -1832,6 +1833,7 @@ kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
 {
 	struct kretprobe *rp = get_kretprobe(ri);
 	struct trace_kprobe *tk;
+	unsigned int flags;
 
 	/*
 	 * There is a small chance that get_kretprobe(ri) returns NULL when
@@ -1844,10 +1846,11 @@ kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
 	tk = container_of(rp, struct trace_kprobe, rp);
 	raw_cpu_inc(*tk->nhit);
 
-	if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
+	flags = trace_probe_load_flag(&tk->tp);
+	if (flags & TP_FLAG_TRACE)
 		kretprobe_trace_func(tk, ri, regs);
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		kretprobe_perf_func(tk, ri, regs);
 #endif
 	return 0;	/* We don't tweak kernel, so just return 0 */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 842383fbc03b..08b5bda24da2 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -271,16 +271,21 @@ struct event_file_link {
 	struct list_head		list;
 };
 
+static inline unsigned int trace_probe_load_flag(struct trace_probe *tp)
+{
+	return smp_load_acquire(&tp->event->flags);
+}
+
 static inline bool trace_probe_test_flag(struct trace_probe *tp,
 					 unsigned int flag)
 {
-	return !!(tp->event->flags & flag);
+	return !!(trace_probe_load_flag(tp) & flag);
 }
 
 static inline void trace_probe_set_flag(struct trace_probe *tp,
 					unsigned int flag)
 {
-	tp->event->flags |= flag;
+	smp_store_release(&tp->event->flags, tp->event->flags | flag);
 }
 
 static inline void trace_probe_clear_flag(struct trace_probe *tp,
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b0bcc0d8f41..430d09c49462 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1547,6 +1547,7 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs,
 	struct trace_uprobe *tu;
 	struct uprobe_dispatch_data udd;
 	struct uprobe_cpu_buffer *ucb = NULL;
+	unsigned int flags;
 	int ret = 0;
 
 	tu = container_of(con, struct trace_uprobe, consumer);
@@ -1561,11 +1562,12 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs,
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
 		return 0;
 
-	if (trace_probe_test_flag(&tu->tp, TP_FLAG_TRACE))
+	flags = trace_probe_load_flag(&tu->tp);
+	if (flags & TP_FLAG_TRACE)
 		ret |= uprobe_trace_func(tu, regs, &ucb);
 
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tu->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		ret |= uprobe_perf_func(tu, regs, &ucb);
 #endif
 	uprobe_buffer_put(ucb);
@@ -1579,6 +1581,7 @@ static int uretprobe_dispatcher(struct uprobe_consumer *con,
 	struct trace_uprobe *tu;
 	struct uprobe_dispatch_data udd;
 	struct uprobe_cpu_buffer *ucb = NULL;
+	unsigned int flags;
 
 	tu = container_of(con, struct trace_uprobe, consumer);
 
@@ -1590,11 +1593,12 @@ static int uretprobe_dispatcher(struct uprobe_consumer *con,
 	if (WARN_ON_ONCE(!uprobe_cpu_buffer))
 		return 0;
 
-	if (trace_probe_test_flag(&tu->tp, TP_FLAG_TRACE))
+	flags = trace_probe_load_flag(&tu->tp);
+	if (flags & TP_FLAG_TRACE)
 		uretprobe_trace_func(tu, func, regs, &ucb);
 
 #ifdef CONFIG_PERF_EVENTS
-	if (trace_probe_test_flag(&tu->tp, TP_FLAG_PROFILE))
+	if (flags & TP_FLAG_PROFILE)
 		uretprobe_perf_func(tu, func, regs, &ucb);
 #endif
 	uprobe_buffer_put(ucb);
-- 
2.39.5

Re: [PATCH v4] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Peter Zijlstra 9 hours ago
On Wed, Oct 01, 2025 at 03:20:25AM +0100, chenyuan_fl@163.com wrote:

> 
> v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().
> v2->v3: Adopt RELEASE/ACQUIRE semantics per Peter/John's suggestions,
>         aligning with Steven's clarification on barrier purposes.
> v3->v4: Introduce load_flag() (Masami) and optimize barrier usage in
>         checks/clear (Peter).

Stuff like this ought to go below the --- such that git-am and similar
tools throw it out.

> Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
> ---

> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index ccae62d4fb91..b1b793b8f191 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -1813,14 +1813,15 @@ static int kprobe_register(struct trace_event_call *event,
>  static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
>  {
>  	struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
> +	unsigned int flags = trace_probe_load_flag(&tk->tp);
>  	int ret = 0;
>  
>  	raw_cpu_inc(*tk->nhit);
>  
> -	if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
> +	if (flags & TP_FLAG_TRACE)
>  		kprobe_trace_func(tk, regs);
>  #ifdef CONFIG_PERF_EVENTS
> -	if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
> +	if (flags & TP_FLAG_PROFILE)
>  		ret = kprobe_perf_func(tk, regs);
>  #endif
>  	return ret;
> @@ -1832,6 +1833,7 @@ kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
>  {
>  	struct kretprobe *rp = get_kretprobe(ri);
>  	struct trace_kprobe *tk;
> +	unsigned int flags;
>  
>  	/*
>  	 * There is a small chance that get_kretprobe(ri) returns NULL when
> @@ -1844,10 +1846,11 @@ kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
>  	tk = container_of(rp, struct trace_kprobe, rp);
>  	raw_cpu_inc(*tk->nhit);
>  
> -	if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
> +	flags = trace_probe_load_flag(&tk->tp);
> +	if (flags & TP_FLAG_TRACE)
>  		kretprobe_trace_func(tk, ri, regs);
>  #ifdef CONFIG_PERF_EVENTS
> -	if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
> +	if (flags & TP_FLAG_PROFILE)
>  		kretprobe_perf_func(tk, ri, regs);
>  #endif
>  	return 0;	/* We don't tweak kernel, so just return 0 */


These two are inconsistent in that the first does load_flag before
inc(nhit) while the second does it after. I don't think it matters, but
I noticed the inconsistent style and had to tell.


Anyway, patch looks reasonable now. Rest is up to Steve, this is his
code.
Re: [PATCH v4] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Steven Rostedt 7 hours ago
On Wed, 1 Oct 2025 14:32:00 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Anyway, patch looks reasonable now. Rest is up to Steve, this is his
> code.

Actually, it's Masami's ;-)  And it's up to him.

-- Steve
Re: [PATCH v2] tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
Posted by Peter Zijlstra 2 days, 12 hours ago
On Mon, Sep 29, 2025 at 04:48:36AM -0400, Steven Rostedt wrote:
> On Mon, 29 Sep 2025 07:57:31 +0100
> chenyuan_fl@163.com wrote:
> 
> > From: Yuan Chen <chenyuan@kylinos.cn>
> > 
> > There is a critical race condition in kprobe initialization that can lead to
> > NULL pointer dereference and kernel crash.
> > 
> > [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000

> > [1135630.434828]  kprobe_perf_func+0x30/0x260
> > [1135630.441661]  kprobe_dispatcher+0x44/0x60
> > [1135630.448396]  aggr_pre_handler+0x70/0xc8
> > [1135630.454959]  kprobe_breakpoint_handler+0x140/0x1e0
> > [1135630.462435]  brk_handler+0xbc/0xd8
> > [1135630.468437]  do_debug_exception+0x84/0x138

> > 
> > kernel/trace/trace_kprobe.c
> > 1308: head = this_cpu_ptr(call->perf_events);
> > 1309:   if (hlist_empty(head))
> > 1310:           return 0;
> > 
> > crash> struct trace_event_call -o  
> > struct trace_event_call {
> >   ...
> >   [120] struct hlist_head *perf_events;  //(call->perf_event)
> >   ...
> > }
> > 
> > crash> struct trace_event_call ffffaf015340e528  
> > struct trace_event_call {
> >   ...
> >   perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
> >   ...
> > }
> > 
> > Race Condition Analysis:
> > 
> > The race occurs between kprobe activation and perf_events initialization:
> > 
> >   CPU0                                    CPU1
> >   ====                                    ====
> >   perf_kprobe_init
> >     perf_trace_event_init
> >       tp_event->perf_events = list;(1)
> >       tp_event->class->reg (2)← KPROBE ACTIVE
> >                                           Debug exception triggers
> >                                           ...
> >                                           kprobe_dispatcher
> >                                             kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
> >                                               head = this_cpu_ptr(call->perf_events)(3)
> >                                               (perf_events is still NULL)
> > 
> > Problem:
> > 1. CPU0 executes (1) assigning tp_event->perf_events = list
> > 2. CPU0 executes (2) enabling kprobe functionality via class->reg()

This is kprobe_register() doing enable_trace_kprobe() ?

> > 3. CPU1 triggers and reaches kprobe_dispatcher
> > 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> > 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> >    call->perf_events is still NULL
> > 
> > The issue: Assignment in step 1 may not be visible to CPU1 due to
> > missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
> > 
> > Add smp_mb() barrier between perf_events assignment and enabling
> > profile functionality to ensure visibility ordering across CPUs.

Yeah, that cannot be right.

> > Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
> > ---
> >  kernel/trace/trace_event_perf.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > index a6bb7577e8c5..6eff8c9d6bae 100644
> > --- a/kernel/trace/trace_event_perf.c
> > +++ b/kernel/trace/trace_event_perf.c
> > @@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
> >  
> >  	tp_event->perf_events = list;
> >  
> > +	/* Ensure perf_events assignment is visible to all CPUs before enabling
> > +	 * profile functionality
> > +	 */
> > +	smp_mb();
> 
> So from other discussions I had with John and Sebastian (both Cc'd),
> memory barriers are not for "making memory visible", but instead are
> for interactions between memory and two different CPUs, where both CPUs
> have memory barriers.

Correct, barriers have to be paired. The above doesn't have enough clues
for me to know what code is affected, but if we're talking about
kprobe_register(PERF_REG) := enable_trace_kprobe(,NULL), when it might
be that trace_probe_set_flag() should be an smp_store_release(), while
trace_probe_test_flag(PROFILE) in kprobe_dispatch() needs to be a
smp_load_acquire().

Without the acquire it might still be possible for the CPU to lift the
call->perf_event load up before the event->flags load, rendering your
wmb pointless.

The guarantee you're looking for is that if the flag is set, it sees a
fully formed event. This is done with RELEASE on publish and ACQUIRE on
access.