[PATCH] perf: fix GPF in perf_cgroup_switch()

Song Liu posted 1 patch 4 years, 4 months ago
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] perf: fix GPF in perf_cgroup_switch()
Posted by Song Liu 4 years, 4 months ago
GPF is observed in perf_cgroup_switch():

[ 2683.232477] general protection fault, probably for non-canonical address 0xdeacffffffffff90: 0000 [#1] SMP
[ 2683.251802] CPU: 30 PID: 0 Comm: swapper/30 Kdump: loaded Tainted: G S
[ 2683.273726] Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP,
[ 2683.291129] RIP: 0010:perf_cgroup_switch+0xc0/0x170
[ 2683.300889] Code: 85 ff ff 48 8b 83 00 01 00 00 48 85 c0 74 04 c6 40 08 \
         00 c6 43 08 00 48 8b 83 70 01 00 00 48 8d 98 90 fe ff ff 48 39 c5 \
         74 7d <8b> 83 e4 00 00 00 85 c0 0f 84 89 00 00 00 4c 8b bb 00 01 00 00 48
[ 2683.338455] RSP: 0018:ffffc9000021fdb0 EFLAGS: 00010002
[ 2683.348904] RAX: dead000000000100 RBX: deacffffffffff90 RCX: 000000000000038f
[ 2683.363176] RDX: 0000000000000007 RSI: 0000000000000400 RDI: 0000000000000000
[ 2683.377447] RBP: ffff88903ffa77b0 R08: 0000000300000003 R09: 0000000000000004
[ 2683.391718] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000002
[ 2683.405989] R13: 0000000000000000 R14: ffff8881013fdc00 R15: 0000000000000000
[ 2683.420261] FS:  0000000000000000(0000) GS:ffff88903ff80000(0000) knlGS:0000000000000000
[ 2683.436446] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2683.447937] CR2: 00007ffafb956020 CR3: 0000000141cd7005 CR4: 00000000007706e0
[ 2683.462209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2683.476481] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2683.490752] PKRU: 55555554
[ 2683.496160] Call Trace:
[ 2683.501048]  __perf_event_task_sched_in+0xb3/0x200
[ 2683.510632]  finish_task_switch+0x186/0x270
[ 2683.518999]  __schedule+0x3b1/0x850
[ 2683.525973]  ? cpuidle_enter_state+0xa7/0x340
[ 2683.534687]  ? update_ts_time_stats+0x51/0x70
[ 2683.543399]  schedule_idle+0x1e/0x40
[ 2683.550548]  do_idle+0x148/0x200
[ 2683.557001]  cpu_startup_entry+0x19/0x20
[ 2683.564843]  start_secondary+0x104/0x140
[ 2683.572688]  secondary_startup_64_no_verify+0xb0/0xbb

which indicates list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

To repro this on Intel CPUs:

  /* occupy all counters with pinned events (watchdog uses another) */
  perf stat -e cycles:D,cycles:D,cycles:D,cycles:D,cycles:D -a &
  /* add a pinned cgroup event */
  perf stat -e cycles:D -G my-cgroup
  /* GPF immediately */

Fix this with list_for_each_entry_safe().

Cc: Yonghong Song <yhs@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index afbf388a5176..46babdf76d8f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -839,7 +839,7 @@ static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list);
  */
 static void perf_cgroup_switch(struct task_struct *task, int mode)
 {
-	struct perf_cpu_context *cpuctx;
+	struct perf_cpu_context *cpuctx, *tmp;
 	struct list_head *list;
 	unsigned long flags;
 
@@ -850,7 +850,7 @@ static void perf_cgroup_switch(struct task_struct *task, int mode)
 	local_irq_save(flags);
 
 	list = this_cpu_ptr(&cgrp_cpuctx_list);
-	list_for_each_entry(cpuctx, list, cgrp_cpuctx_entry) {
+	list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) {
 		WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
 
 		perf_ctx_lock(cpuctx, cpuctx->task_ctx);
-- 
2.30.2

Re: [PATCH] perf: fix GPF in perf_cgroup_switch()
Posted by Peter Zijlstra 4 years, 4 months ago
On Thu, Feb 03, 2022 at 04:40:57PM -0800, Song Liu wrote:
> GPF is observed in perf_cgroup_switch():
> 
> [ 2683.232477] general protection fault, probably for non-canonical address 0xdeacffffffffff90: 0000 [#1] SMP
> [ 2683.251802] CPU: 30 PID: 0 Comm: swapper/30 Kdump: loaded Tainted: G S
> [ 2683.273726] Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP,
> [ 2683.291129] RIP: 0010:perf_cgroup_switch+0xc0/0x170
> [ 2683.300889] Code: 85 ff ff 48 8b 83 00 01 00 00 48 85 c0 74 04 c6 40 08 \
>          00 c6 43 08 00 48 8b 83 70 01 00 00 48 8d 98 90 fe ff ff 48 39 c5 \
>          74 7d <8b> 83 e4 00 00 00 85 c0 0f 84 89 00 00 00 4c 8b bb 00 01 00 00 48
> [ 2683.338455] RSP: 0018:ffffc9000021fdb0 EFLAGS: 00010002
> [ 2683.348904] RAX: dead000000000100 RBX: deacffffffffff90 RCX: 000000000000038f
> [ 2683.363176] RDX: 0000000000000007 RSI: 0000000000000400 RDI: 0000000000000000
> [ 2683.377447] RBP: ffff88903ffa77b0 R08: 0000000300000003 R09: 0000000000000004
> [ 2683.391718] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000002
> [ 2683.405989] R13: 0000000000000000 R14: ffff8881013fdc00 R15: 0000000000000000
> [ 2683.420261] FS:  0000000000000000(0000) GS:ffff88903ff80000(0000) knlGS:0000000000000000
> [ 2683.436446] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2683.447937] CR2: 00007ffafb956020 CR3: 0000000141cd7005 CR4: 00000000007706e0
> [ 2683.462209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2683.476481] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2683.490752] PKRU: 55555554
> [ 2683.496160] Call Trace:
> [ 2683.501048]  __perf_event_task_sched_in+0xb3/0x200
> [ 2683.510632]  finish_task_switch+0x186/0x270
> [ 2683.518999]  __schedule+0x3b1/0x850
> [ 2683.525973]  ? cpuidle_enter_state+0xa7/0x340
> [ 2683.534687]  ? update_ts_time_stats+0x51/0x70
> [ 2683.543399]  schedule_idle+0x1e/0x40
> [ 2683.550548]  do_idle+0x148/0x200
> [ 2683.557001]  cpu_startup_entry+0x19/0x20
> [ 2683.564843]  start_secondary+0x104/0x140
> [ 2683.572688]  secondary_startup_64_no_verify+0xb0/0xbb

Please, don't put splats in changelogs like this, simply put the
relevant information in. Which you've already done below, so the above
is completely redundant.

> 
> which indicates list corruption on cgrp_cpuctx_list. This happens on the
> following path:
> 
>   perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
>       cpu_ctx_sched_in
>          ctx_sched_in
>             ctx_pinned_sched_in
>               merge_sched_in
>                   perf_cgroup_event_disable: remove the event from the list
> 
> To repro this on Intel CPUs:
> 
>   /* occupy all counters with pinned events (watchdog uses another) */
>   perf stat -e cycles:D,cycles:D,cycles:D,cycles:D,cycles:D -a &
>   /* add a pinned cgroup event */
>   perf stat -e cycles:D -G my-cgroup
>   /* GPF immediately */

Luckily this requires root, but please don't hand reproducers like this
to the kiddies :/

> Fix this with list_for_each_entry_safe().

*sigh*

You forgot Fixes, I'm thinking this is due to commit: 058fe1c0440e6.

With all that, I seem to end up with the below.

---
Subject: perf: Fix list corruption in perf_cgroup_switch()
From: Song Liu <song@kernel.org>
Date: Thu, 3 Feb 2022 16:40:57 -0800

From: Song Liu <song@kernel.org>

There's list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

Use list_for_each_entry_safe() to allow removing an entry during
iteration.

Fixes: 058fe1c0440e ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
---

does that work for you?
Re: [PATCH] perf: fix GPF in perf_cgroup_switch()
Posted by Song Liu 4 years, 4 months ago
> On Feb 6, 2022, at 1:34 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Feb 03, 2022 at 04:40:57PM -0800, Song Liu wrote:
>> GPF is observed in perf_cgroup_switch():
>> 
>> [ 2683.232477] general protection fault, probably for non-canonical address 0xdeacffffffffff90: 0000 [#1] SMP
>> [ 2683.251802] CPU: 30 PID: 0 Comm: swapper/30 Kdump: loaded Tainted: G S
>> [ 2683.273726] Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP,
>> [ 2683.291129] RIP: 0010:perf_cgroup_switch+0xc0/0x170
>> [ 2683.300889] Code: 85 ff ff 48 8b 83 00 01 00 00 48 85 c0 74 04 c6 40 08 \
>>         00 c6 43 08 00 48 8b 83 70 01 00 00 48 8d 98 90 fe ff ff 48 39 c5 \
>>         74 7d <8b> 83 e4 00 00 00 85 c0 0f 84 89 00 00 00 4c 8b bb 00 01 00 00 48
>> [ 2683.338455] RSP: 0018:ffffc9000021fdb0 EFLAGS: 00010002
>> [ 2683.348904] RAX: dead000000000100 RBX: deacffffffffff90 RCX: 000000000000038f
>> [ 2683.363176] RDX: 0000000000000007 RSI: 0000000000000400 RDI: 0000000000000000
>> [ 2683.377447] RBP: ffff88903ffa77b0 R08: 0000000300000003 R09: 0000000000000004
>> [ 2683.391718] R10: 0000000000000003 R11: 0000000000000001 R12: 0000000000000002
>> [ 2683.405989] R13: 0000000000000000 R14: ffff8881013fdc00 R15: 0000000000000000
>> [ 2683.420261] FS:  0000000000000000(0000) GS:ffff88903ff80000(0000) knlGS:0000000000000000
>> [ 2683.436446] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2683.447937] CR2: 00007ffafb956020 CR3: 0000000141cd7005 CR4: 00000000007706e0
>> [ 2683.462209] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 2683.476481] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 2683.490752] PKRU: 55555554
>> [ 2683.496160] Call Trace:
>> [ 2683.501048]  __perf_event_task_sched_in+0xb3/0x200
>> [ 2683.510632]  finish_task_switch+0x186/0x270
>> [ 2683.518999]  __schedule+0x3b1/0x850
>> [ 2683.525973]  ? cpuidle_enter_state+0xa7/0x340
>> [ 2683.534687]  ? update_ts_time_stats+0x51/0x70
>> [ 2683.543399]  schedule_idle+0x1e/0x40
>> [ 2683.550548]  do_idle+0x148/0x200
>> [ 2683.557001]  cpu_startup_entry+0x19/0x20
>> [ 2683.564843]  start_secondary+0x104/0x140
>> [ 2683.572688]  secondary_startup_64_no_verify+0xb0/0xbb
> 
> Please, don't put splats in changelogs like this, simply put the
> relevant information in. Which you've already done below, so the above
> is completely redundant.
> 
>> 
>> which indicates list corruption on cgrp_cpuctx_list. This happens on the
>> following path:
>> 
>>  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
>>      cpu_ctx_sched_in
>>         ctx_sched_in
>>            ctx_pinned_sched_in
>>              merge_sched_in
>>                  perf_cgroup_event_disable: remove the event from the list
>> 
>> To repro this on Intel CPUs:
>> 
>>  /* occupy all counters with pinned events (watchdog uses another) */
>>  perf stat -e cycles:D,cycles:D,cycles:D,cycles:D,cycles:D -a &
>>  /* add a pinned cgroup event */
>>  perf stat -e cycles:D -G my-cgroup
>>  /* GPF immediately */
> 
> Luckily this requires root, but please don't hand reproducers like this
> to the kiddies :/
> 
>> Fix this with list_for_each_entry_safe().
> 
> *sigh*
> 
> You forgot Fixes, I'm thinking this is due to commit: 058fe1c0440e6.
> 
> With all that, I seem to end up with the below.
> 
> ---
> Subject: perf: Fix list corruption in perf_cgroup_switch()
> From: Song Liu <song@kernel.org>
> Date: Thu, 3 Feb 2022 16:40:57 -0800
> 
> From: Song Liu <song@kernel.org>
> 
> There's list corruption on cgrp_cpuctx_list. This happens on the
> following path:
> 
>  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
>      cpu_ctx_sched_in
>         ctx_sched_in
>            ctx_pinned_sched_in
>              merge_sched_in
>                  perf_cgroup_event_disable: remove the event from the list
> 
> Use list_for_each_entry_safe() to allow removing an entry during
> iteration.
> 
> Fixes: 058fe1c0440e ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
> Signed-off-by: Song Liu <song@kernel.org>
> Reviewed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
> ---
> 
> does that work for you?

Yes, this works! Thanks for fixing up things. 

Song
Re: [PATCH] perf: fix GPF in perf_cgroup_switch()
Posted by Rik van Riel 4 years, 4 months ago
On Thu, 2022-02-03 at 16:40 -0800, Song Liu wrote:
> Fix this with list_for_each_entry_safe().
> 
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Rik van Riel <riel@surriel.com>
> Signed-off-by: Song Liu <song@kernel.org>

Reviewed-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.
[tip: perf/urgent] perf: Fix list corruption in perf_cgroup_switch()
Posted by tip-bot2 for Song Liu 4 years, 4 months ago
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     5f4e5ce638e6a490b976ade4a40017b40abb2da0
Gitweb:        https://git.kernel.org/tip/5f4e5ce638e6a490b976ade4a40017b40abb2da0
Author:        Song Liu <song@kernel.org>
AuthorDate:    Thu, 03 Feb 2022 16:40:57 -08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sun, 06 Feb 2022 22:37:27 +01:00

perf: Fix list corruption in perf_cgroup_switch()

There's list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

Use list_for_each_entry_safe() to allow removing an entry during
iteration.

Fixes: 058fe1c0440e ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
---
 kernel/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57c7197..6859229 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -839,7 +839,7 @@ static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list);
  */
 static void perf_cgroup_switch(struct task_struct *task, int mode)
 {
-	struct perf_cpu_context *cpuctx;
+	struct perf_cpu_context *cpuctx, *tmp;
 	struct list_head *list;
 	unsigned long flags;
 
@@ -850,7 +850,7 @@ static void perf_cgroup_switch(struct task_struct *task, int mode)
 	local_irq_save(flags);
 
 	list = this_cpu_ptr(&cgrp_cpuctx_list);
-	list_for_each_entry(cpuctx, list, cgrp_cpuctx_entry) {
+	list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) {
 		WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
 
 		perf_ctx_lock(cpuctx, cpuctx->task_ctx);