[PATCH] [PATCH v2] perf/x86/amd: check event before enable to avoid GPF

George Kennedy posted 1 patch 1 month, 4 weeks ago
There is a newer version of this series
arch/x86/events/amd/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] [PATCH v2] perf/x86/amd: check event before enable to avoid GPF
Posted by George Kennedy 1 month, 4 weeks ago
On AMD machines cpuc->events[idx] can become NULL in a subtle race
condition with NMI->throttle->x86_pmu_stop().

Check event for NULL in amd_pmu_enable_all() before enable to avoid a GPF.
This appears to be an AMD only issue.

Syzkaller reported a GPF in amd_pmu_enable_all.

INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.143
    msecs
Oops: general protection fault, probably for non-canonical address
    0xdffffc0000000034: 0000  PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000001a0-0x00000000000001a7]
CPU: 0 UID: 0 PID: 328415 Comm: repro_36674776 Not tainted 6.12.0-rc1-syzk
RIP: 0010:x86_pmu_enable_event (arch/x86/events/perf_event.h:1195
    arch/x86/events/core.c:1430)
RSP: 0018:ffff888118009d60 EFLAGS: 00010012
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000034 RSI: 0000000000000000 RDI: 00000000000001a0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: ffff88811802a440 R14: ffff88811802a240 R15: ffff8881132d8601
FS:  00007f097dfaa700(0000) GS:ffff888118000000(0000) GS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200001c0 CR3: 0000000103d56000 CR4: 00000000000006f0
Call Trace:
 <IRQ>
amd_pmu_enable_all (arch/x86/events/amd/core.c:760 (discriminator 2))
x86_pmu_enable (arch/x86/events/core.c:1360)
event_sched_out (kernel/events/core.c:1191 kernel/events/core.c:1186
    kernel/events/core.c:2346)
__perf_remove_from_context (kernel/events/core.c:2435)
event_function (kernel/events/core.c:259)
remote_function (kernel/events/core.c:92 (discriminator 1)
    kernel/events/core.c:72 (discriminator 1))
__flush_smp_call_function_queue (./arch/x86/include/asm/jump_label.h:27
    ./include/linux/jump_label.h:207 ./include/trace/events/csd.h:64
    kernel/smp.c:135 kernel/smp.c:540)
__sysvec_call_function_single (./arch/x86/include/asm/jump_label.h:27
    ./include/linux/jump_label.h:207
    ./arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:272)
sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 47)
    arch/x86/kernel/smp.c:266 (discriminator 47))
 </IRQ>

Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: George Kennedy <george.kennedy@oracle.com>
---
Ravi requested patch resend with the following:
  /*
   * FIXME: cpuc->events[idx] can become NULL in a subtle race
   * condition with NMI->throttle->x86_pmu_stop().
   */

 arch/x86/events/amd/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 920e3a640cad..3a8b8db878f6 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -762,7 +762,8 @@ static void amd_pmu_enable_all(int added)
 		if (!test_bit(idx, cpuc->active_mask))
 			continue;
 
-		amd_pmu_enable_event(cpuc->events[idx]);
+		if (cpuc->events[idx])
+			amd_pmu_enable_event(cpuc->events[idx]);
 	}
 }
 
-- 
2.39.3
Re: [PATCH] [PATCH v2] perf/x86/amd: check event before enable to avoid GPF
Posted by Ravi Bangoria 1 month, 3 weeks ago
On 01-Oct-24 1:22 AM, George Kennedy wrote:
> On AMD machines cpuc->events[idx] can become NULL in a subtle race
> condition with NMI->throttle->x86_pmu_stop().
> 
> Check event for NULL in amd_pmu_enable_all() before enable to avoid a GPF.
> This appears to be an AMD only issue.
> 
> Syzkaller reported a GPF in amd_pmu_enable_all.
> 
> INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.143
>     msecs
> Oops: general protection fault, probably for non-canonical address
>     0xdffffc0000000034: 0000  PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000001a0-0x00000000000001a7]
> CPU: 0 UID: 0 PID: 328415 Comm: repro_36674776 Not tainted 6.12.0-rc1-syzk
> RIP: 0010:x86_pmu_enable_event (arch/x86/events/perf_event.h:1195
>     arch/x86/events/core.c:1430)
> RSP: 0018:ffff888118009d60 EFLAGS: 00010012
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000034 RSI: 0000000000000000 RDI: 00000000000001a0
> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: ffff88811802a440 R14: ffff88811802a240 R15: ffff8881132d8601
> FS:  00007f097dfaa700(0000) GS:ffff888118000000(0000) GS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200001c0 CR3: 0000000103d56000 CR4: 00000000000006f0
> Call Trace:
>  <IRQ>
> amd_pmu_enable_all (arch/x86/events/amd/core.c:760 (discriminator 2))
> x86_pmu_enable (arch/x86/events/core.c:1360)
> event_sched_out (kernel/events/core.c:1191 kernel/events/core.c:1186
>     kernel/events/core.c:2346)
> __perf_remove_from_context (kernel/events/core.c:2435)
> event_function (kernel/events/core.c:259)
> remote_function (kernel/events/core.c:92 (discriminator 1)
>     kernel/events/core.c:72 (discriminator 1))
> __flush_smp_call_function_queue (./arch/x86/include/asm/jump_label.h:27
>     ./include/linux/jump_label.h:207 ./include/trace/events/csd.h:64
>     kernel/smp.c:135 kernel/smp.c:540)
> __sysvec_call_function_single (./arch/x86/include/asm/jump_label.h:27
>     ./include/linux/jump_label.h:207
>     ./arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:272)
> sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 47)
>     arch/x86/kernel/smp.c:266 (discriminator 47))
>  </IRQ>
> 
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Signed-off-by: George Kennedy <george.kennedy@oracle.com>

Peter, Ingo,

I've been blocking this for quite some time without acting on it. Since
the patch is trivial and harmless, I'm giving an Ack here. However, the
underlying race condition is still unknown, so I will get back to it.

Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>

> ---
> Ravi requested patch resend with the following:
>   /*
>    * FIXME: cpuc->events[idx] can become NULL in a subtle race
>    * condition with NMI->throttle->x86_pmu_stop().
>    */

George, the comment should be inside the code.

> 
>  arch/x86/events/amd/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 920e3a640cad..3a8b8db878f6 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -762,7 +762,8 @@ static void amd_pmu_enable_all(int added)
>  		if (!test_bit(idx, cpuc->active_mask))
>  			continue;
>  
> -		amd_pmu_enable_event(cpuc->events[idx]);
> +		if (cpuc->events[idx])
> +			amd_pmu_enable_event(cpuc->events[idx]);
>  	}
>  }
>  

Thanks,
Ravi