[PATCH v3] perf/core: Fix warning due to unordred pmu_ctx_list

Luo Gengkun posted 1 patch 11 months ago
kernel/events/core.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
[PATCH v3] perf/core: Fix warning due to unordred pmu_ctx_list
Posted by Luo Gengkun 11 months ago
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data. vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.

The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/pmu is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.

To fix this problem, insert the perf_event_pmu_context to proper place
after iteration of pmu_ctx_list.

The follow testcase can trigger above warning:

 # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
 # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out

test.c

void main() {
        int count = 0;
        pid_t pid;

        printf("%d running\n", getpid());
        sleep(30);
        printf("running\n");

        pid = fork();
        if (pid == -1) {
                printf("fork error\n");
                return;
        }
        if (pid == 0) {
                while (1) {
                        count++;
                }
        } else {
                while (1) {
                        count++;
                }
        }
}

The testcase first open a lbr event, so it will alloc task_ctx_data, and
then open tracepoint and software events, so the parent ctx will have 3
different perf_event_pmu_contexts. When doing inherit, child ctx will
insert the perf_event_pmu_context in another order then the warning will
trigger.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
1. update commit message.
2. modify annotation style.
3. add reviewed by.
Link to v2: https://lore.kernel.org/all/20250121130802.1813928-1-luogengkun@huaweicloud.com/
Link to v1: https://lore.kernel.org/all/20250120114344.632474-1-luogengkun@huaweicloud.com

---
 kernel/events/core.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 065f9188b44a..3f68fbbf3de0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4950,7 +4950,7 @@ static struct perf_event_pmu_context *
 find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 		     struct perf_event *event)
 {
-	struct perf_event_pmu_context *new = NULL, *epc;
+	struct perf_event_pmu_context *new = NULL, *pos = NULL, *epc;
 	void *task_ctx_data = NULL;
 
 	if (!ctx->task) {
@@ -5007,12 +5007,19 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 			atomic_inc(&epc->refcount);
 			goto found_epc;
 		}
+		/* Make sure the pmu_ctx_list is sorted by pmu */
+		if (!pos && epc->pmu->type > pmu->type)
+			pos = epc;
 	}
 
 	epc = new;
 	new = NULL;
 
-	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+	if (!pos)
+		list_add_tail(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+	else
+		list_add(&epc->pmu_ctx_entry, pos->pmu_ctx_entry.prev);
+
 	epc->ctx = ctx;
 
 found_epc:
-- 
2.34.1
Re: [PATCH v3] perf/core: Fix warning due to unordred pmu_ctx_list
Posted by Markus Elfring 10 months, 2 weeks ago
…
> The follow testcase can trigger above warning:

      following test case?


…
> The testcase first open a lbr event, so it will alloc task_ctx_data, and

      test case?                                  allocate?


> then open tracepoint and software events, so the parent ctx will have 3

                                                          context?


> different perf_event_pmu_contexts. When doing inherit, child ctx will
…

                                                inheritance?   context?


Please avoid a typo also in the summary phrase.

Regards,
Markus
Re: [PATCH v3] perf/core: Fix warning due to unordred pmu_ctx_list
Posted by Luo Gengkun 10 months, 2 weeks ago
On 2025/1/22 15:33, Luo Gengkun wrote:
> Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
> perf_event_swap_task_ctx_data. vmcore shows that two lists have the same
> perf_event_pmu_context, but not in the same order.
>
> The problem is that the order of pmu_ctx_list for the parent is impacted by
> the time when an event/pmu is added. While the order for a child is
> impacted by the event order in the pinned_groups and flexible_groups. So
> the order of pmu_ctx_list in the parent and child may be different.
>
> To fix this problem, insert the perf_event_pmu_context to proper place
> after iteration of pmu_ctx_list.
>
> The follow testcase can trigger above warning:
>
>   # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
>   # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
>
> test.c
>
> void main() {
>          int count = 0;
>          pid_t pid;
>
>          printf("%d running\n", getpid());
>          sleep(30);
>          printf("running\n");
>
>          pid = fork();
>          if (pid == -1) {
>                  printf("fork error\n");
>                  return;
>          }
>          if (pid == 0) {
>                  while (1) {
>                          count++;
>                  }
>          } else {
>                  while (1) {
>                          count++;
>                  }
>          }
> }
>
> The testcase first open a lbr event, so it will alloc task_ctx_data, and
> then open tracepoint and software events, so the parent ctx will have 3
> different perf_event_pmu_contexts. When doing inherit, child ctx will
> insert the perf_event_pmu_context in another order then the warning will
> trigger.
>
> Fixes: bd2756811766 ("perf: Rewrite core context handling")
> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> 1. update commit message.
> 2. modify annotation style.
> 3. add reviewed by.
> Link to v2: https://lore.kernel.org/all/20250121130802.1813928-1-luogengkun@huaweicloud.com/
> Link to v1: https://lore.kernel.org/all/20250120114344.632474-1-luogengkun@huaweicloud.com
>
> ---
>   kernel/events/core.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 065f9188b44a..3f68fbbf3de0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4950,7 +4950,7 @@ static struct perf_event_pmu_context *
>   find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>   		     struct perf_event *event)
>   {
> -	struct perf_event_pmu_context *new = NULL, *epc;
> +	struct perf_event_pmu_context *new = NULL, *pos = NULL, *epc;
>   	void *task_ctx_data = NULL;
>   
>   	if (!ctx->task) {
> @@ -5007,12 +5007,19 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>   			atomic_inc(&epc->refcount);
>   			goto found_epc;
>   		}
> +		/* Make sure the pmu_ctx_list is sorted by pmu */
> +		if (!pos && epc->pmu->type > pmu->type)
> +			pos = epc;
>   	}
>   
>   	epc = new;
>   	new = NULL;
>   
> -	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
> +	if (!pos)
> +		list_add_tail(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
> +	else
> +		list_add(&epc->pmu_ctx_entry, pos->pmu_ctx_entry.prev);
> +
>   	epc->ctx = ctx;
>   
>   found_epc:

ping.  Does this patch look ready? If so, perhaps we can merge this patch.

Thanks,

Gengkun

[tip: perf/urgent] perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
Posted by tip-bot2 for Luo Gengkun 9 months, 4 weeks ago
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     2016066c66192a99d9e0ebf433789c490a6785a2
Gitweb:        https://git.kernel.org/tip/2016066c66192a99d9e0ebf433789c490a6785a2
Author:        Luo Gengkun <luogengkun@huaweicloud.com>
AuthorDate:    Wed, 22 Jan 2025 07:33:56 
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 24 Feb 2025 19:22:37 +01:00

perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list

Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.

The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.

To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.

The follow testcase can trigger above warning:

 # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
 # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out

 test.c

 void main() {
        int count = 0;
        pid_t pid;

        printf("%d running\n", getpid());
        sleep(30);
        printf("running\n");

        pid = fork();
        if (pid == -1) {
                printf("fork error\n");
                return;
        }
        if (pid == 0) {
                while (1) {
                        count++;
                }
        } else {
                while (1) {
                        count++;
                }
        }
 }

The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.

[ mingo: Tidied up the changelog. ]

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250122073356.1824736-1-luogengkun@huaweicloud.com
---
 kernel/events/core.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7dabbca..086d46d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4950,7 +4950,7 @@ static struct perf_event_pmu_context *
 find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 		     struct perf_event *event)
 {
-	struct perf_event_pmu_context *new = NULL, *epc;
+	struct perf_event_pmu_context *new = NULL, *pos = NULL, *epc;
 	void *task_ctx_data = NULL;
 
 	if (!ctx->task) {
@@ -5007,12 +5007,19 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 			atomic_inc(&epc->refcount);
 			goto found_epc;
 		}
+		/* Make sure the pmu_ctx_list is sorted by PMU type: */
+		if (!pos && epc->pmu->type > pmu->type)
+			pos = epc;
 	}
 
 	epc = new;
 	new = NULL;
 
-	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+	if (!pos)
+		list_add_tail(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+	else
+		list_add(&epc->pmu_ctx_entry, pos->pmu_ctx_entry.prev);
+
 	epc->ctx = ctx;
 
 found_epc: