kernel/events/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Max Makarov reported kernel panic [1] in perf user callchain code.
The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
- after current->utask is freed and
- before current->utask is set to NULL
general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
...
? die_addr+0x36/0x90
? exc_general_protection+0x217/0x420
? asm_exc_general_protection+0x26/0x30
? is_uprobe_at_func_entry+0x28/0x80
perf_callchain_user+0x20a/0x360
get_perf_callchain+0x147/0x1d0
bpf_get_stackid+0x60/0x90
bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
? __smp_call_single_queue+0xad/0x120
bpf_overflow_handler+0x75/0x110
...
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:__kmem_cache_free+0x1cb/0x350
...
? uprobe_free_utask+0x62/0x80
? acct_collect+0x4c/0x220
uprobe_free_utask+0x62/0x80
mm_release+0x12/0xb0
do_exit+0x26b/0xaa0
__x64_sys_exit+0x1b/0x20
do_syscall_64+0x5a/0x80
It can be easily reproduced by running following commands in
separate terminals:
# while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
# bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.
[1] https://github.com/grafana/pyroscope/issues/3673
Reported-by: Max Makarov <maxpain@linux.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index fa04b14a7d72..5d71ef85420c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1915,6 +1915,7 @@ void uprobe_free_utask(struct task_struct *t)
if (!utask)
return;
+ t->utask = NULL;
WARN_ON_ONCE(utask->active_uprobe || utask->xol_vaddr);
timer_delete_sync(&utask->ri_timer);
@@ -1924,7 +1925,6 @@ void uprobe_free_utask(struct task_struct *t)
ri = free_ret_instance(ri, true /* cleanup_hprobe */);
kfree(utask);
- t->utask = NULL;
}
#define RI_TIMER_PERIOD (HZ / 10) /* 100 ms */
--
2.47.0
On Thu, Jan 9, 2025 at 6:14 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Max Makarov reported kernel panic [1] in perf user callchain code.
>
> The reason for that is the race between uprobe_free_utask and bpf
> profiler code doing the perf user stack unwind and is triggered
> within uprobe_free_utask function:
> - after current->utask is freed and
> - before current->utask is set to NULL
>
> general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
> RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
> ...
> ? die_addr+0x36/0x90
> ? exc_general_protection+0x217/0x420
> ? asm_exc_general_protection+0x26/0x30
> ? is_uprobe_at_func_entry+0x28/0x80
> perf_callchain_user+0x20a/0x360
> get_perf_callchain+0x147/0x1d0
> bpf_get_stackid+0x60/0x90
> bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
> ? __smp_call_single_queue+0xad/0x120
> bpf_overflow_handler+0x75/0x110
> ...
> asm_sysvec_apic_timer_interrupt+0x1a/0x20
> RIP: 0010:__kmem_cache_free+0x1cb/0x350
> ...
> ? uprobe_free_utask+0x62/0x80
> ? acct_collect+0x4c/0x220
> uprobe_free_utask+0x62/0x80
> mm_release+0x12/0xb0
> do_exit+0x26b/0xaa0
> __x64_sys_exit+0x1b/0x20
> do_syscall_64+0x5a/0x80
>
> It can be easily reproduced by running following commands in
> separate terminals:
>
> # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
> # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
>
> Fixing this by making sure current->utask pointer is set to NULL
> before we start to release the utask object.
>
> [1] https://github.com/grafana/pyroscope/issues/3673
> Reported-by: Max Makarov <maxpain@linux.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> kernel/events/uprobes.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
ah, it's interrupt/NMI checking current->utask, makes total sense,
thanks for the fix!
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index fa04b14a7d72..5d71ef85420c 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1915,6 +1915,7 @@ void uprobe_free_utask(struct task_struct *t)
> if (!utask)
> return;
>
> + t->utask = NULL;
> WARN_ON_ONCE(utask->active_uprobe || utask->xol_vaddr);
>
> timer_delete_sync(&utask->ri_timer);
> @@ -1924,7 +1925,6 @@ void uprobe_free_utask(struct task_struct *t)
> ri = free_ret_instance(ri, true /* cleanup_hprobe */);
>
> kfree(utask);
> - t->utask = NULL;
> }
>
> #define RI_TIMER_PERIOD (HZ / 10) /* 100 ms */
> --
> 2.47.0
>
On 1/9/25 3:14 PM, Jiri Olsa wrote:
> Max Makarov reported kernel panic [1] in perf user callchain code.
>
> The reason for that is the race between uprobe_free_utask and bpf
> profiler code doing the perf user stack unwind and is triggered
> within uprobe_free_utask function:
> - after current->utask is freed and
> - before current->utask is set to NULL
>
> general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
> RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
> ...
> ? die_addr+0x36/0x90
> ? exc_general_protection+0x217/0x420
> ? asm_exc_general_protection+0x26/0x30
> ? is_uprobe_at_func_entry+0x28/0x80
> perf_callchain_user+0x20a/0x360
> get_perf_callchain+0x147/0x1d0
> bpf_get_stackid+0x60/0x90
> bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
> ? __smp_call_single_queue+0xad/0x120
> bpf_overflow_handler+0x75/0x110
> ...
> asm_sysvec_apic_timer_interrupt+0x1a/0x20
> RIP: 0010:__kmem_cache_free+0x1cb/0x350
> ...
> ? uprobe_free_utask+0x62/0x80
> ? acct_collect+0x4c/0x220
> uprobe_free_utask+0x62/0x80
> mm_release+0x12/0xb0
> do_exit+0x26b/0xaa0
> __x64_sys_exit+0x1b/0x20
> do_syscall_64+0x5a/0x80
>
> It can be easily reproduced by running following commands in
> separate terminals:
>
> # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
> # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
>
> Fixing this by making sure current->utask pointer is set to NULL
> before we start to release the utask object.
Lets add Fixes tag for stable team:
Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
> [1] https://github.com/grafana/pyroscope/issues/3673
> Reported-by: Max Makarov <maxpain@linux.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
fwiw, the other version we were potentially thinking of was below, but
just moving the t->utask NULL assignment seemed better.
Thanks,
Daniel
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c75c482d4c52..05f9cedf2691 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2835,6 +2835,8 @@ static bool is_uprobe_at_func_entry(struct pt_regs *regs)
if (!current->utask)
return false;
+ if (!current->utask->active_uprobe)
+ return false;
auprobe = current->utask->auprobe;
if (!auprobe)
On Thu, Jan 09, 2025 at 03:41:26PM +0100, Daniel Borkmann wrote:
> On 1/9/25 3:14 PM, Jiri Olsa wrote:
> > Max Makarov reported kernel panic [1] in perf user callchain code.
> >
> > The reason for that is the race between uprobe_free_utask and bpf
> > profiler code doing the perf user stack unwind and is triggered
> > within uprobe_free_utask function:
> > - after current->utask is freed and
> > - before current->utask is set to NULL
> >
> > general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
> > RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
> > ...
> > ? die_addr+0x36/0x90
> > ? exc_general_protection+0x217/0x420
> > ? asm_exc_general_protection+0x26/0x30
> > ? is_uprobe_at_func_entry+0x28/0x80
> > perf_callchain_user+0x20a/0x360
> > get_perf_callchain+0x147/0x1d0
> > bpf_get_stackid+0x60/0x90
> > bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
> > ? __smp_call_single_queue+0xad/0x120
> > bpf_overflow_handler+0x75/0x110
> > ...
> > asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > RIP: 0010:__kmem_cache_free+0x1cb/0x350
> > ...
> > ? uprobe_free_utask+0x62/0x80
> > ? acct_collect+0x4c/0x220
> > uprobe_free_utask+0x62/0x80
> > mm_release+0x12/0xb0
> > do_exit+0x26b/0xaa0
> > __x64_sys_exit+0x1b/0x20
> > do_syscall_64+0x5a/0x80
> >
> > It can be easily reproduced by running following commands in
> > separate terminals:
> >
> > # while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
> > # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
> >
> > Fixing this by making sure current->utask pointer is set to NULL
> > before we start to release the utask object.
>
> Lets add Fixes tag for stable team:
>
> Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
ugh right, thanks for finding that
jirka
>
> > [1] https://github.com/grafana/pyroscope/issues/3673
> > Reported-by: Max Makarov <maxpain@linux.com>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> fwiw, the other version we were potentially thinking of was below, but
> just moving the t->utask NULL assignment seemed better.
>
> Thanks,
> Daniel
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c75c482d4c52..05f9cedf2691 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -2835,6 +2835,8 @@ static bool is_uprobe_at_func_entry(struct pt_regs *regs)
>
> if (!current->utask)
> return false;
> + if (!current->utask->active_uprobe)
> + return false;
>
> auprobe = current->utask->auprobe;
> if (!auprobe)
On 01/09, Jiri Olsa wrote: > > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -1915,6 +1915,7 @@ void uprobe_free_utask(struct task_struct *t) > if (!utask) > return; > > + t->utask = NULL; > WARN_ON_ONCE(utask->active_uprobe || utask->xol_vaddr); > > timer_delete_sync(&utask->ri_timer); > @@ -1924,7 +1925,6 @@ void uprobe_free_utask(struct task_struct *t) > ri = free_ret_instance(ri, true /* cleanup_hprobe */); > > kfree(utask); > - t->utask = NULL; Acked-by: Oleg Nesterov <oleg@redhat.com>
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: b583ef82b671c9a752fbe3e95bd4c1c51eab764d
Gitweb: https://git.kernel.org/tip/b583ef82b671c9a752fbe3e95bd4c1c51eab764d
Author: Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 09 Jan 2025 15:14:40 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 10 Jan 2025 09:28:01 +01:00
uprobes: Fix race in uprobe_free_utask
Max Makarov reported kernel panic [1] in perf user callchain code.
The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
- after current->utask is freed and
- before current->utask is set to NULL
general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
...
? die_addr+0x36/0x90
? exc_general_protection+0x217/0x420
? asm_exc_general_protection+0x26/0x30
? is_uprobe_at_func_entry+0x28/0x80
perf_callchain_user+0x20a/0x360
get_perf_callchain+0x147/0x1d0
bpf_get_stackid+0x60/0x90
bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
? __smp_call_single_queue+0xad/0x120
bpf_overflow_handler+0x75/0x110
...
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:__kmem_cache_free+0x1cb/0x350
...
? uprobe_free_utask+0x62/0x80
? acct_collect+0x4c/0x220
uprobe_free_utask+0x62/0x80
mm_release+0x12/0xb0
do_exit+0x26b/0xaa0
__x64_sys_exit+0x1b/0x20
do_syscall_64+0x5a/0x80
It can be easily reproduced by running following commands in
separate terminals:
# while :; do bpftrace -e 'uprobe:/bin/ls:_start { printf("hit\n"); }' -c ls; done
# bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'
Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.
[1] https://github.com/grafana/pyroscope/issues/3673
Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
Reported-by: Max Makarov <maxpain@linux.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20250109141440.2692173-1-jolsa@kernel.org
---
kernel/events/uprobes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index fa04b14..5d71ef8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1915,6 +1915,7 @@ void uprobe_free_utask(struct task_struct *t)
if (!utask)
return;
+ t->utask = NULL;
WARN_ON_ONCE(utask->active_uprobe || utask->xol_vaddr);
timer_delete_sync(&utask->ri_timer);
@@ -1924,7 +1925,6 @@ void uprobe_free_utask(struct task_struct *t)
ri = free_ret_instance(ri, true /* cleanup_hprobe */);
kfree(utask);
- t->utask = NULL;
}
#define RI_TIMER_PERIOD (HZ / 10) /* 100 ms */
© 2016 - 2025 Red Hat, Inc.