[PATCH] perf stat: Fix crash on arm64

Breno Leitao posted 1 patch 1 day, 17 hours ago
tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
[PATCH] perf stat: Fix crash on arm64
Posted by Breno Leitao 1 day, 17 hours ago
Perf stat is crashing on arm64 hosts with the following issue:

	# make -C tools/perf DEBUG=1
	# perf stat sleep 1
	perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
	[1]    1220794 IOT instruction (core dumped)  ./perf stat

The sorting function introduced by commit a745c0831c15c ("perf stat:
Sort default events/metrics") compares events based on their individual
properties. This can cause events from different groups to be
interleaved, resulting in group members appearing before their leaders
in the sorted evlist.

When the iterator opens events in list order, a group member may be
processed before its leader has been opened.

For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
could be sorted before its leader, causing the crash when CPU_CYCLES
tries to get its group fd from the not-yet-opened leader.

Fix this by comparing events based on their leader's attributes instead
of their own attributes when the events are in different groups. This
ensures all members of a group share the same sort key as their leader,
keeping groups together and guaranteeing leaders are opened before their
members.

Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Cc; linux-arm-kernel@lists.infradead.org
---
 tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index ab40d85fb1259..3a423ca31d8d3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
 	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
 	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
 	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+	const struct evsel *lhs_leader = evsel__leader(lhs);
+	const struct evsel *rhs_leader = evsel__leader(rhs);
 
-	if (evsel__leader(lhs) == evsel__leader(rhs)) {
+	if (lhs_leader == rhs_leader) {
 		/* Within the same group, respect the original order. */
 		return lhs_core->idx - rhs_core->idx;
 	}
 
+	/*
+	 * Compare using leader's attributes so that all members of a group
+	 * stay together. This ensures leaders are opened before their members.
+	 */
+
 	/* Sort default metrics evsels first, and default show events before those. */
-	if (lhs->default_metricgroup != rhs->default_metricgroup)
-		return lhs->default_metricgroup ? -1 : 1;
+	if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
+		return lhs_leader->default_metricgroup ? -1 : 1;
 
-	if (lhs->default_show_events != rhs->default_show_events)
-		return lhs->default_show_events ? -1 : 1;
+	if (lhs_leader->default_show_events != rhs_leader->default_show_events)
+		return lhs_leader->default_show_events ? -1 : 1;
 
 	/* Sort by PMU type (prefers legacy types first). */
-	if (lhs->pmu != rhs->pmu)
-		return lhs->pmu->type - rhs->pmu->type;
+	if (lhs_leader->pmu != rhs_leader->pmu)
+		return lhs_leader->pmu->type - rhs_leader->pmu->type;
 
-	/* Sort by name. */
-	return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
+	/* Sort by leader's name. */
+	return strcmp(evsel__name((struct evsel *)lhs_leader),
+		      evsel__name((struct evsel *)rhs_leader));
 }
 
 /*

---
base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
change-id: 20260205-perf_stat-a0a2a37e21c5

Best regards,
--  
Breno Leitao <leitao@debian.org>
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Ian Rogers 1 day, 12 hours ago
On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
>
> Perf stat is crashing on arm64 hosts with the following issue:
>
>         # make -C tools/perf DEBUG=1
>         # perf stat sleep 1
>         perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
>         [1]    1220794 IOT instruction (core dumped)  ./perf stat
>
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.

Hi, sorry for the issue. I can see what you're saying but why is this
an arm64 issue? The legacy Default metrics are common to all
architectures:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next

> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
>
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.

Which metric is this?

> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.

This makes sense but I'm not understanding why this problem wasn't
seen previously. I'm guessing that in a metric like
backend_cycles_idle:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next#n63
```
        "BriefDescription": "Backend stalls per cycle",
        "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
        "MetricGroup": "Default",
        "MetricName": "backend_cycles_idle",
        "MetricThreshold": "backend_cycles_idle > 0.2",
        "DefaultShowEvents": "1"
```
The PMUs for cpu-cycles and stalled-cycles differ? This may mean we
also need to be smarting in determining PMUs for legacy events.

It'd be interesting to see what events are coming from the kernel, e.g.:
```
$ ls /sys/bus/event_source/devices/*/events
/sys/bus/event_source/devices/cpu_atom/events:
branch-instructions  cache-misses      instructions  ref-cycles
topdown-fe-bound
branch-misses        cache-references  mem-loads     topdown-bad-spec
topdown-retiring
bus-cycles           cpu-cycles        mem-stores    topdown-be-bound
...
```
and the cpuid to match it up with the json.
```
$ perf stat -v sleep 1 2>&1 |head -1
Using CPUID GenuineIntel-6-B7-1
$ ./tools/perf/pmu-events/models.py x86 GenuineIntel-6-B7-1
tools/perf/pmu-events/arch/
alderlake
```
this information is in the verbose output too:
```
$ perf stat -vv sleep 1
...
------------------------------------------------------------
perf_event_attr:
 type                             1 (PERF_TYPE_SOFTWARE)
 size                             136
 config                           0x2 (PERF_COUNT_SW_PAGE_FAULTS)
 sample_type                      IDENTIFIER
 read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
 disabled                         1
 inherit                          1
 enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 608809  cpu -1  group_fd -1  flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0xa00000001
(cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/)
 sample_type                      IDENTIFIER
 read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
 disabled                         1
 inherit                          1
 enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 608809  cpu -1  group_fd -1  flags 0x8 = 8
...
```

Thanks,
Ian

> Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Cc; linux-arm-kernel@lists.infradead.org
> ---
>  tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index ab40d85fb1259..3a423ca31d8d3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
>         const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
>         const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
>         const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> +       const struct evsel *lhs_leader = evsel__leader(lhs);
> +       const struct evsel *rhs_leader = evsel__leader(rhs);
>
> -       if (evsel__leader(lhs) == evsel__leader(rhs)) {
> +       if (lhs_leader == rhs_leader) {
>                 /* Within the same group, respect the original order. */
>                 return lhs_core->idx - rhs_core->idx;
>         }
>
> +       /*
> +        * Compare using leader's attributes so that all members of a group
> +        * stay together. This ensures leaders are opened before their members.
> +        */
> +
>         /* Sort default metrics evsels first, and default show events before those. */
> -       if (lhs->default_metricgroup != rhs->default_metricgroup)
> -               return lhs->default_metricgroup ? -1 : 1;
> +       if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
> +               return lhs_leader->default_metricgroup ? -1 : 1;
>
> -       if (lhs->default_show_events != rhs->default_show_events)
> -               return lhs->default_show_events ? -1 : 1;
> +       if (lhs_leader->default_show_events != rhs_leader->default_show_events)
> +               return lhs_leader->default_show_events ? -1 : 1;
>
>         /* Sort by PMU type (prefers legacy types first). */
> -       if (lhs->pmu != rhs->pmu)
> -               return lhs->pmu->type - rhs->pmu->type;
> +       if (lhs_leader->pmu != rhs_leader->pmu)
> +               return lhs_leader->pmu->type - rhs_leader->pmu->type;
>
> -       /* Sort by name. */
> -       return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
> +       /* Sort by leader's name. */
> +       return strcmp(evsel__name((struct evsel *)lhs_leader),
> +                     evsel__name((struct evsel *)rhs_leader));
>  }
>
>  /*
>
> ---
> base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
> change-id: 20260205-perf_stat-a0a2a37e21c5
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Breno Leitao 17 hours ago
Hello Ian, thanks for the quick reply!

On Thu, Feb 05, 2026 at 08:59:07AM -0800, Ian Rogers wrote:
> On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > Perf stat is crashing on arm64 hosts with the following issue:
> >
> >         # make -C tools/perf DEBUG=1
> >         # perf stat sleep 1
> >         perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> >         [1]    1220794 IOT instruction (core dumped)  ./perf stat
> >
> > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > Sort default events/metrics") compares events based on their individual
> > properties. This can cause events from different groups to be
> > interleaved, resulting in group members appearing before their leaders
> > in the sorted evlist.
> 
> Hi, sorry for the issue. I can see what you're saying but why is this
> an arm64 issue?

Sorry, It's not ARM64-specific - the bug is in the generic sort code.
It just happens to manifest on ARM64.

> The legacy Default metrics are common to all
> architectures:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> 
> > When the iterator opens events in list order, a group member may be
> > processed before its leader has been opened.
> >
> > For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> > could be sorted before its leader, causing the crash when CPU_CYCLES
> > tries to get its group fd from the not-yet-opened leader.
> 
> Which metric is this?

These are ARM neoverse metrics, they can be found in
tools/perf/pmu-events/arch/arm64/arm/neoverse-n*

> > Fix this by comparing events based on their leader's attributes instead
> > of their own attributes when the events are in different groups. This
> > ensures all members of a group share the same sort key as their leader,
> > keeping groups together and guaranteeing leaders are opened before their
> > members.
> 
> This makes sense but I'm not understanding why this problem wasn't
> seen previously. I'm guessing that in a metric like
> backend_cycles_idle:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next#n63
> ```
>         "BriefDescription": "Backend stalls per cycle",
>         "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
>         "MetricGroup": "Default",
>         "MetricName": "backend_cycles_idle",
>         "MetricThreshold": "backend_cycles_idle > 0.2",
>         "DefaultShowEvents": "1"
> ```

I was able to limit this to the following json:

  [
      {
          "ArchStdEvent": "backend_bound",
          "MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
      },
      {
          "ArchStdEvent": "frontend_bound",
          "MetricExpr": "(100 * (((STALL_SLOT_FRONTEND) / (CPU_CYCLES * #slots)) - (BR_MIS_PRED / CPU_CYCLES)))"
      }
  ]

and then

	# ./tools/perf/perf stat -v sleep 0.01
	Using CPUID 0x00000000410fd4f0
	metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound
	metric expr 100 * (STALL_SLOT_FRONTEND / (CPU_CYCLES * #slots) - BR_MIS_PRED / CPU_CYCLES) for frontend_bound
	metric expr (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) / (duration_time * 1e9) for CPUs_utilized
	metric expr stalled\-cycles\-backend / cpu\-cycles for backend_cycles_idle
	metric expr stalled\-cycles\-backend / cpu\-cycles for backend_cycles_idle
	metric expr branches / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for branch_frequency
	metric expr branch\-misses / branches for branch_miss_rate
	metric expr branch\-misses / branches for branch_miss_rate
	metric expr software@context\-switches\,name\=context\-switches@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for cs_per_second
	metric expr cpu\-cycles / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for cycles_frequency
	metric expr stalled\-cycles\-frontend / cpu\-cycles for frontend_cycles_idle
	metric expr stalled\-cycles\-frontend / cpu\-cycles for frontend_cycles_idle
	metric expr instructions / cpu\-cycles for insn_per_cycle
	metric expr instructions / cpu\-cycles for insn_per_cycle
	metric expr software@cpu\-migrations\,name\=cpu\-migrations@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for migrations_per_second
	metric expr software@page\-faults\,name\=page\-faults@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for page_faults_per_second
	metric expr max(stalled\-cycles\-frontend, stalled\-cycles\-backend) / instructions for stalled_cycles_per_instruction
	hwmon_pmu: failure to open '/sys/class/hwmon/hwmon4/name'
	hwmon_pmu: failure to open '/sys/class/hwmon/hwmon5/name'
	hwmon_pmu: failure to open '/sys/class/hwmon/hwmon3/name'
	found event software@context-switches,name=context-switches@
	found event duration_time
	found event software@page-faults,name=page-faults@
	found event software@task-clock,name=task-clock@
	found event cpu-cycles
	found event branches
	found event software@cpu-migrations,name=cpu-migrations@
	Parsing metric events 'software/context-switches,name=context-switches,metric-id=software!3context!1switches!0name!2context!1switches!3/,software/page-faults,name=page-faults,metric-id=software!3page!1faults!0name!2page!1faults!3/,software/task-clock,name=task-clock,metric-id=software!3task!1clock!0name!2task!1clock!3/,cpu-cycles/metric-id=cpu!1cycles/,branches/metric-id=branches/,software/cpu-migrations,name=cpu-migrations,metric-id=software!3cpu!1migrations!0name!2cpu!1migrations!3/,duration_time'
	cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
	branches -> armv8_pmuv3_0/metric-id=branches,branches/
	duration_time -> tool/duration_time/
	found event STALL_SLOT_FRONTEND
	found event duration_time
	found event BR_MIS_PRED
	found event CPU_CYCLES
	Parsing metric events '{STALL_SLOT_FRONTEND/metric-id=STALL_SLOT_FRONTEND/,BR_MIS_PRED/metric-id=BR_MIS_PRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W,duration_time'
	STALL_SLOT_FRONTEND -> armv8_pmuv3_0/metric-id=STALL_SLOT_FRONTEND,STALL_SLOT_FRONTEND/
	BR_MIS_PRED -> armv8_pmuv3_0/metric-id=BR_MIS_PRED,BR_MIS_PRED/
	CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/
	duration_time -> tool/duration_time/
	Matched metric-id STALL_SLOT_FRONTEND to STALL_SLOT_FRONTEND
	Matched metric-id BR_MIS_PRED to BR_MIS_PRED
	Matched metric-id CPU_CYCLES to CPU_CYCLES
	Matched metric-id duration_time to duration_time
	found event STALL_SLOT_BACKEND
	found event duration_time
	found event BR_MIS_PRED
	found event CPU_CYCLES
	Parsing metric events '{STALL_SLOT_BACKEND/metric-id=STALL_SLOT_BACKEND/,BR_MIS_PRED/metric-id=BR_MIS_PRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W,duration_time'
	STALL_SLOT_BACKEND -> armv8_pmuv3_0/metric-id=STALL_SLOT_BACKEND,STALL_SLOT_BACKEND/
	BR_MIS_PRED -> armv8_pmuv3_0/metric-id=BR_MIS_PRED,BR_MIS_PRED/
	CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/
	duration_time -> tool/duration_time/
	Matched metric-id STALL_SLOT_BACKEND to STALL_SLOT_BACKEND
	Matched metric-id BR_MIS_PRED to BR_MIS_PRED
	Matched metric-id CPU_CYCLES to CPU_CYCLES
	Matched metric-id duration_time to duration_time
	found event duration_time
	found event stalled-cycles-backend
	found event instructions
	found event stalled-cycles-frontend
	Parsing metric events '{stalled-cycles-backend/metric-id=stalled!1cycles!1backend/,instructions/metric-id=instructions/,stalled-cycles-frontend/metric-id=stalled!1cycles!1frontend/}:W,duration_time'
	stalled-cycles-backend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1backend,stalled-cycles-backend/
	instructions -> armv8_pmuv3_0/metric-id=instructions,instructions/
	stalled-cycles-frontend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1frontend,stalled-cycles-frontend/
	duration_time -> tool/duration_time/
	Matched metric-id stalled-cycles-backend to stalled-cycles-backend
	Matched metric-id instructions to instructions
	Matched metric-id stalled-cycles-frontend to stalled-cycles-frontend
	Matched metric-id duration_time to duration_time
	Matched metric-id software@page-faults,name=page-faults@ to page-faults
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	Matched metric-id software@cpu-migrations,name=cpu-migrations@ to cpu-migrations
	found event duration_time
	found event cpu-cycles
	found event instructions
	Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,instructions/metric-id=instructions/}:W,duration_time'
	cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
	instructions -> armv8_pmuv3_0/metric-id=instructions,instructions/
	duration_time -> tool/duration_time/
	Matched metric-id cpu-cycles to cpu-cycles
	Matched metric-id instructions to instructions
	Matched metric-id duration_time to duration_time
	found event duration_time
	found event cpu-cycles
	found event stalled-cycles-frontend
	Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,stalled-cycles-frontend/metric-id=stalled!1cycles!1frontend/}:W,duration_time'
	cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
	stalled-cycles-frontend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1frontend,stalled-cycles-frontend/
	duration_time -> tool/duration_time/
	Matched metric-id cpu-cycles to cpu-cycles
	Matched metric-id stalled-cycles-frontend to stalled-cycles-frontend
	Matched metric-id duration_time to duration_time
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	Matched metric-id cpu-cycles to cpu-cycles
	Matched metric-id software@context-switches,name=context-switches@ to context-switches
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	found event duration_time
	found event branch-misses
	found event branches
	Parsing metric events '{branch-misses/metric-id=branch!1misses/,branches/metric-id=branches/}:W,duration_time'
	branch-misses -> armv8_pmuv3_0/metric-id=branch!1misses,branch-misses/
	branches -> armv8_pmuv3_0/metric-id=branches,branches/
	duration_time -> tool/duration_time/
	Matched metric-id branch-misses to branch-misses
	Matched metric-id branches to branches
	Matched metric-id duration_time to duration_time
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	Matched metric-id branches to branches
	found event duration_time
	found event cpu-cycles
	found event stalled-cycles-backend
	Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,stalled-cycles-backend/metric-id=stalled!1cycles!1backend/}:W,duration_time'
	cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
	stalled-cycles-backend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1backend,stalled-cycles-backend/
	duration_time -> tool/duration_time/
	Matched metric-id cpu-cycles to cpu-cycles
	Matched metric-id stalled-cycles-backend to stalled-cycles-backend
	Matched metric-id duration_time to duration_time
	Matched metric-id software@task-clock,name=task-clock@ to task-clock
	Matched metric-id duration_time to duration_time
	copying metric event for cgroup 'root': context-switches (idx=0)
	copying metric event for cgroup 'root': page-faults (idx=1)
	copying metric event for cgroup 'root': task-clock (idx=2)
	copying metric event for cgroup 'root': cpu-cycles (idx=3)
	copying metric event for cgroup 'root': branches (idx=4)
	copying metric event for cgroup 'root': cpu-migrations (idx=5)
	copying metric event for cgroup 'root': STALL_SLOT_FRONTEND (idx=7)
	copying metric event for cgroup 'root': stalled-cycles-backend (idx=29)
	copying metric event for cgroup 'root': STALL_SLOT_BACKEND (idx=11)
	copying metric event for cgroup 'root': stalled-cycles-backend (idx=15)
	copying metric event for cgroup 'root': instructions (idx=20)
	copying metric event for cgroup 'root': stalled-cycles-frontend (idx=23)
	copying metric event for cgroup 'root': branch-misses (idx=25)
	copying metric event for cgroup 'root': context-switches (idx=6)
	copying metric event for cgroup 'root': page-faults (idx=8)
	copying metric event for cgroup 'root': task-clock (idx=9)
	copying metric event for cgroup 'root': cpu-cycles (idx=13)
	copying metric event for cgroup 'root': branches (idx=12)
	copying metric event for cgroup 'root': cpu-migrations (idx=7)
	copying metric event for cgroup 'root': STALL_SLOT_FRONTEND (idx=25)
	copying metric event for cgroup 'root': stalled-cycles-backend (idx=19)
	copying metric event for cgroup 'root': STALL_SLOT_BACKEND (idx=29)
	copying metric event for cgroup 'root': stalled-cycles-backend (idx=20)
	copying metric event for cgroup 'root': instructions (idx=15)
	copying metric event for cgroup 'root': stalled-cycles-frontend (idx=17)
	copying metric event for cgroup 'root': branch-misses (idx=10)
	Control descriptor is not initialized
	perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
	[1]    832866 IOT instruction (core dumped)  ./tools/perf/perf stat -v sleep 0.01


> The PMUs for cpu-cycles and stalled-cycles differ? This may mean we
> also need to be smarting in determining PMUs for legacy events.
> 
> It'd be interesting to see what events are coming from the kernel, e.g.:
> ```
> $ ls /sys/bus/event_source/devices/*/events

# ls /sys/bus/event_source/devices/*/events
	/sys/bus/event_source/devices/armv8_pmuv3_0/events:
	br_mis_pred          cti_trigout7        l1d_tlb_refill      l2d_tlb_refill         mem_access_checked_wr  stall_backend_mem
	br_mis_pred_retired  dtlb_walk           l1i_cache           l3d_cache              memory_error           stall_frontend
	br_pred              exc_return          l1i_cache_lmiss     l3d_cache_allocate     op_retired             stall_slot
	br_retired           exc_taken           l1i_cache_refill    l3d_cache_lmiss_rd     op_spec                stall_slot_backend
	bus_access           inst_retired        l1i_tlb             l3d_cache_refill       remote_access          stall_slot_frontend
	bus_cycles           inst_spec           l1i_tlb_refill      ld_align_lat           sample_collision       trb_wrap
	cid_write_retired    itlb_walk           l2d_cache           ldst_align_lat         sample_feed            trcextout0
	cnt_cycles           l1d_cache           l2d_cache_allocate  ll_cache_miss_rd       sample_filtrate        trcextout1
	cpu_cycles           l1d_cache_lmiss_rd  l2d_cache_lmiss_rd  ll_cache_rd            sample_pop             trcextout2
	cti_trigout4         l1d_cache_refill    l2d_cache_refill    mem_access             st_align_lat           trcextout3
	cti_trigout5         l1d_cache_wb        l2d_cache_wb        mem_access_checked     stall                  ttbr_write_retired
	cti_trigout6         l1d_tlb             l2d_tlb             mem_access_checked_rd  stall_backend

	/sys/bus/event_source/devices/cs_etm/events:
	autofdo

	/sys/bus/event_source/devices/nvidia_cnvlink_pmu_0/events:
	cycles        rd_bytes_rem     rd_cum_outs_rem  rd_req_rem       total_bytes_rem  total_req_rem  wr_bytes_rem  wr_req_rem
	rd_bytes_loc  rd_cum_outs_loc  rd_req_loc       total_bytes_loc  total_req_loc    wr_bytes_loc   wr_req_loc

	/sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_0/events:
	cycles        rd_bytes_rem     rd_cum_outs_rem  rd_req_rem       total_bytes_rem  total_req_rem  wr_bytes_rem  wr_req_rem
	rd_bytes_loc  rd_cum_outs_loc  rd_req_loc       total_bytes_loc  total_req_loc    wr_bytes_loc   wr_req_loc

	/sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_0/events:
	cycles        rd_bytes_rem     rd_cum_outs_rem  rd_req_rem       total_bytes_rem  total_req_rem  wr_bytes_rem  wr_req_rem
	rd_bytes_loc  rd_cum_outs_loc  rd_req_loc       total_bytes_loc  total_req_loc    wr_bytes_loc   wr_req_loc

	/sys/bus/event_source/devices/nvidia_pcie_pmu_0/events:
	cycles        rd_bytes_rem     rd_cum_outs_rem  rd_req_rem       total_bytes_rem  total_req_rem  wr_bytes_rem  wr_req_rem
	rd_bytes_loc  rd_cum_outs_loc  rd_req_loc       total_bytes_loc  total_req_loc    wr_bytes_loc   wr_req_loc

	/sys/bus/event_source/devices/nvidia_scf_pmu_0/events:
	bus_cycles           gmem_rd_data                  scf_cache                socket_1_rd_access       socket_2_wb_data
	cmem_rd_access       gmem_rd_outstanding           scf_cache_allocate       socket_1_rd_data         socket_2_wr_access
	cmem_rd_data         gmem_wb_access                scf_cache_refill         socket_1_rd_outstanding  socket_2_wr_data
	cmem_rd_outstanding  gmem_wb_data                  scf_cache_wb             socket_1_wb_access       socket_3_rd_access
	cmem_wb_access       gmem_wr_access                socket_0_rd_access       socket_1_wb_data         socket_3_rd_data
	cmem_wb_data         gmem_wr_data                  socket_0_rd_data         socket_1_wr_access       socket_3_rd_outstanding
	cmem_wr_access       gmem_wr_total_bytes           socket_0_rd_outstanding  socket_1_wr_data         socket_3_wb_access
	cmem_wr_data         remote_socket_rd_access       socket_0_wb_access       socket_2_rd_access       socket_3_wb_data
	cmem_wr_total_bytes  remote_socket_rd_data         socket_0_wb_data         socket_2_rd_data         socket_3_wr_access
	cycles               remote_socket_rd_outstanding  socket_0_wr_access       socket_2_rd_outstanding  socket_3_wr_data
	gmem_rd_access       remote_socket_wr_total_bytes  socket_0_wr_data         socket_2_wb_access

	/sys/bus/event_source/devices/smmuv3_pmcg_11002/events:
	config_cache_miss  config_struct_access  cycles  pcie_ats_trans_rq  tlb_miss  transaction  trans_table_walk_access

	/sys/bus/event_source/devices/smmuv3_pmcg_11042/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_11062/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_11082/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_110a2/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_12002/events:
	config_cache_miss  config_struct_access  cycles  pcie_ats_trans_rq  tlb_miss  transaction  trans_table_walk_access

	/sys/bus/event_source/devices/smmuv3_pmcg_12042/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_12062/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_12082/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_120a2/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_15002/events:
	config_cache_miss  config_struct_access  cycles  pcie_ats_trans_rq  tlb_miss  transaction  trans_table_walk_access

	/sys/bus/event_source/devices/smmuv3_pmcg_15042/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_15062/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_15082/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_150a2/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_16002/events:
	config_cache_miss  config_struct_access  cycles  pcie_ats_trans_rq  tlb_miss  transaction  trans_table_walk_access

	/sys/bus/event_source/devices/smmuv3_pmcg_16042/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_16062/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_5002/events:
	config_cache_miss  config_struct_access  cycles  pcie_ats_trans_rq  tlb_miss  transaction  trans_table_walk_access

	/sys/bus/event_source/devices/smmuv3_pmcg_5042/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction

	/sys/bus/event_source/devices/smmuv3_pmcg_5062/events:
	cycles  pcie_ats_trans_passed  tlb_miss  transaction


> ```
> and the cpuid to match it up with the json.
> ```
> $ perf stat -v sleep 1 2>&1 |head -1

	# perf stat -v sleep 1 2>&1 |head -1
	Using CPUID 0x00000000410fd4f0

> this information is in the verbose output too:
> ```
> $ perf stat -vv sleep 1

	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             1 (PERF_TYPE_SOFTWARE)
	size                             144
	config                           0x3 (PERF_COUNT_SW_CONTEXT_SWITCHES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 3
	------------------------------------------------------------
	perf_event_attr:
	type                             1 (PERF_TYPE_SOFTWARE)
	size                             144
	config                           0x4 (PERF_COUNT_SW_CPU_MIGRATIONS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 4
	------------------------------------------------------------
	perf_event_attr:
	type                             1 (PERF_TYPE_SOFTWARE)
	size                             144
	config                           0x2 (PERF_COUNT_SW_PAGE_FAULTS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 5
	------------------------------------------------------------
	perf_event_attr:
	type                             1 (PERF_TYPE_SOFTWARE)
	size                             144
	config                           0x1 (PERF_COUNT_SW_TASK_CLOCK)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 7
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x5 (PERF_COUNT_HW_BRANCH_MISSES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 8
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 8  flags 0x8 = 9
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 10
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0 (PERF_COUNT_HW_CPU_CYCLES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 11
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0 (PERF_COUNT_HW_CPU_CYCLES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 12
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 12  flags 0x8 = 13
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0 (PERF_COUNT_HW_CPU_CYCLES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 14
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x7 (PERF_COUNT_HW_STALLED_CYCLES_FRONTEND)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 14  flags 0x8 = 15
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0 (PERF_COUNT_HW_CPU_CYCLES)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 16
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x8 (PERF_COUNT_HW_STALLED_CYCLES_BACKEND)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 16  flags 0x8 = 17
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x8 (PERF_COUNT_HW_STALLED_CYCLES_BACKEND)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 18
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x1 (PERF_COUNT_HW_INSTRUCTIONS)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 18  flags 0x8 = 19
	------------------------------------------------------------
	perf_event_attr:
	type                             0 (PERF_TYPE_HARDWARE)
	size                             144
	config                           0x7 (PERF_COUNT_HW_STALLED_CYCLES_FRONTEND)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 18  flags 0x8 = 20
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             4294967294 (tool)
	size                             144
	config                           0x1 (duration_time)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x10 (br_mis_pred)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	disabled                         1
	inherit                          1
	enable_on_exec                   1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd -1  flags 0x8 = 21
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x3b (op_spec)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 21  flags 0x8 = 22
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x3f (stall_slot)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 21  flags 0x8 = 23
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x11 (cpu_cycles)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 21  flags 0x8 = 24
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x3a (op_retired)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------
	sys_perf_event_open: pid 865887  cpu -1  group_fd 21  flags 0x8 = 25
	------------------------------------------------------------
	perf_event_attr:
	type                             10 (armv8_pmuv3_0)
	size                             144
	config                           0x11 (cpu_cycles)
	sample_type                      IDENTIFIER
	read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
	inherit                          1
	------------------------------------------------------------

Thanks for your help,
--breno
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Leo Yan 1 day, 11 hours ago
Hi Ian,

On Thu, Feb 05, 2026 at 08:59:07AM -0800, Ian Rogers wrote:
> On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > Perf stat is crashing on arm64 hosts with the following issue:
> >
> >         # make -C tools/perf DEBUG=1
> >         # perf stat sleep 1
> >         perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> >         [1]    1220794 IOT instruction (core dumped)  ./perf stat
> >
> > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > Sort default events/metrics") compares events based on their individual
> > properties. This can cause events from different groups to be
> > interleaved, resulting in group members appearing before their leaders
> > in the sorted evlist.
> 
> Hi, sorry for the issue. I can see what you're saying but why is this
> an arm64 issue? The legacy Default metrics are common to all
> architectures:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next

Since you are mentioning common metrics, I found the common metrics does
not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
but both don't work).

The latest perf will have no any output if the CPU type is missed in
json and rallback to common metrics. The failure path is:

  add_default_events()
    metricgroup__parse_groups()
      pmu_metrics_table__find()  => return NULL

In my case, pmu_metrics_table__find() always return NULL, as a result,
`perf stat sleep 1` directly bail out without any output.

I expect Breno's env might have the corresponding CPU json files, this
is possible different from my test machine.

Thanks,
Leo
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Leo Yan 1 day, 11 hours ago
On Thu, Feb 05, 2026 at 05:39:18PM +0000, Leo Yan wrote:

> > > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > > Sort default events/metrics") compares events based on their individual
> > > properties. This can cause events from different groups to be
> > > interleaved, resulting in group members appearing before their leaders
> > > in the sorted evlist.
> > 
> > Hi, sorry for the issue. I can see what you're saying but why is this
> > an arm64 issue? The legacy Default metrics are common to all
> > architectures:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> 
> Since you are mentioning common metrics, I found the common metrics does
> not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
> but both don't work).
> 
> The latest perf will have no any output if the CPU type is missed in
> json and rallback to common metrics. The failure path is:
> 
>   add_default_events()
>     metricgroup__parse_groups()
>       pmu_metrics_table__find()  => return NULL
> 
> In my case, pmu_metrics_table__find() always return NULL, as a result,
> `perf stat sleep 1` directly bail out without any output.
> 
> I expect Breno's env might have the corresponding CPU json files, this
> is possible different from my test machine.

On my local env, I need a fix:

diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index e4d00f6b2b5d..f74acc206856 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -3237,14 +3237,6 @@ const struct pmu_events_table *perf_pmu__default_core_events_table(void)
         return NULL;
 }
 
-const struct pmu_metrics_table *pmu_metrics_table__find(void)
-{
-        struct perf_cpu cpu = {-1};
-        const struct pmu_events_map *map = map_for_cpu(cpu);
-
-        return map ? &map->metric_table : NULL;
-}
-
 const struct pmu_metrics_table *pmu_metrics_table__default(void)
 {
         int i = 0;
@@ -3261,6 +3253,17 @@ const struct pmu_metrics_table *pmu_metrics_table__default(void)
         return NULL;
 }
 
+const struct pmu_metrics_table *pmu_metrics_table__find(void)
+{
+        struct perf_cpu cpu = {-1};
+        const struct pmu_events_map *map = map_for_cpu(cpu);
+
+       if (map)
+               return &map->metric_table;
+
+       return pmu_metrics_table__default();
+}
+

I have no deep understanding for jevents, seems to me, Breno's issue is
a different one from me.  Please kindly confirm.

Thanks,
Leo
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Ian Rogers 1 day, 10 hours ago
On Thu, Feb 5, 2026 at 9:52 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Thu, Feb 05, 2026 at 05:39:18PM +0000, Leo Yan wrote:
>
> > > > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > > > Sort default events/metrics") compares events based on their individual
> > > > properties. This can cause events from different groups to be
> > > > interleaved, resulting in group members appearing before their leaders
> > > > in the sorted evlist.
> > >
> > > Hi, sorry for the issue. I can see what you're saying but why is this
> > > an arm64 issue? The legacy Default metrics are common to all
> > > architectures:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> >
> > Since you are mentioning common metrics, I found the common metrics does
> > not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
> > but both don't work).
> >
> > The latest perf will have no any output if the CPU type is missed in
> > json and rallback to common metrics. The failure path is:
> >
> >   add_default_events()
> >     metricgroup__parse_groups()
> >       pmu_metrics_table__find()  => return NULL
> >

The return is correct but the early return is wrong, the metric code
was updated to always consider the default table and skip a NULL
table:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next#n430
I'll send a patch for the early return.

> > In my case, pmu_metrics_table__find() always return NULL, as a result,
> > `perf stat sleep 1` directly bail out without any output.
> >
> > I expect Breno's env might have the corresponding CPU json files, this
> > is possible different from my test machine.
>
> On my local env, I need a fix:
>
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index e4d00f6b2b5d..f74acc206856 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -3237,14 +3237,6 @@ const struct pmu_events_table *perf_pmu__default_core_events_table(void)
>          return NULL;
>  }
>
> -const struct pmu_metrics_table *pmu_metrics_table__find(void)
> -{
> -        struct perf_cpu cpu = {-1};
> -        const struct pmu_events_map *map = map_for_cpu(cpu);
> -
> -        return map ? &map->metric_table : NULL;
> -}
> -
>  const struct pmu_metrics_table *pmu_metrics_table__default(void)
>  {
>          int i = 0;
> @@ -3261,6 +3253,17 @@ const struct pmu_metrics_table *pmu_metrics_table__default(void)
>          return NULL;
>  }
>
> +const struct pmu_metrics_table *pmu_metrics_table__find(void)
> +{
> +        struct perf_cpu cpu = {-1};
> +        const struct pmu_events_map *map = map_for_cpu(cpu);
> +
> +       if (map)
> +               return &map->metric_table;
> +
> +       return pmu_metrics_table__default();
> +}
> +
>
> I have no deep understanding for jevents, seems to me, Breno's issue is
> a different one from me.  Please kindly confirm.

I think it is a different issue, they have metrics while you don't.
Your report does highlight we're missing a NO_JEVENTS=1 build-test,
but the build is working for me. I'll send out two patches for these
issues.

Thanks,
Ian

> Thanks,
> Leo
Re: [PATCH] perf stat: Fix crash on arm64
Posted by Dmitry Ilvokhin 1 day, 15 hours ago
On Thu, Feb 05, 2026 at 03:46:31AM -0800, Breno Leitao wrote:
> Perf stat is crashing on arm64 hosts with the following issue:
> 
> 	# make -C tools/perf DEBUG=1
> 	# perf stat sleep 1
> 	perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> 	[1]    1220794 IOT instruction (core dumped)  ./perf stat
> 
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.
> 
> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
> 
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.
> 
> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.
> 
> Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Cc; linux-arm-kernel@lists.infradead.org
> ---
>  tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index ab40d85fb1259..3a423ca31d8d3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
>  	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
>  	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
>  	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> +	const struct evsel *lhs_leader = evsel__leader(lhs);
> +	const struct evsel *rhs_leader = evsel__leader(rhs);
>  
> -	if (evsel__leader(lhs) == evsel__leader(rhs)) {
> +	if (lhs_leader == rhs_leader) {
>  		/* Within the same group, respect the original order. */
>  		return lhs_core->idx - rhs_core->idx;
>  	}
>  
> +	/*
> +	 * Compare using leader's attributes so that all members of a group
> +	 * stay together. This ensures leaders are opened before their members.
> +	 */
> +
>  	/* Sort default metrics evsels first, and default show events before those. */
> -	if (lhs->default_metricgroup != rhs->default_metricgroup)
> -		return lhs->default_metricgroup ? -1 : 1;
> +	if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
> +		return lhs_leader->default_metricgroup ? -1 : 1;
>  
> -	if (lhs->default_show_events != rhs->default_show_events)
> -		return lhs->default_show_events ? -1 : 1;
> +	if (lhs_leader->default_show_events != rhs_leader->default_show_events)
> +		return lhs_leader->default_show_events ? -1 : 1;
>  
>  	/* Sort by PMU type (prefers legacy types first). */
> -	if (lhs->pmu != rhs->pmu)
> -		return lhs->pmu->type - rhs->pmu->type;
> +	if (lhs_leader->pmu != rhs_leader->pmu)
> +		return lhs_leader->pmu->type - rhs_leader->pmu->type;
>  
> -	/* Sort by name. */
> -	return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
> +	/* Sort by leader's name. */
> +	return strcmp(evsel__name((struct evsel *)lhs_leader),
> +		      evsel__name((struct evsel *)rhs_leader));
>  }
>  
>  /*
> 
> ---
> base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
> change-id: 20260205-perf_stat-a0a2a37e21c5
> 
> Best regards,
> --  
> Breno Leitao <leitao@debian.org>
> 

Tested-by: Dmitry Ilvokhin <d@ilvokhin.com>