tools/perf/builtin-stat.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
Perf stat is crashing on arm64 hosts with the following issue:
# make -C tools/perf DEBUG=1
# perf stat sleep 1
perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
[1] 1220794 IOT instruction (core dumped) ./perf stat
The sorting function introduced by commit a745c0831c15c ("perf stat:
Sort default events/metrics") compares events based on their individual
properties. This can cause events from different groups to be
interleaved, resulting in group members appearing before their leaders
in the sorted evlist.
When the iterator opens events in list order, a group member may be
processed before its leader has been opened.
For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
could be sorted before its leader, causing the crash when CPU_CYCLES
tries to get its group fd from the not-yet-opened leader.
Fix this by comparing events based on their leader's attributes instead
of their own attributes when the events are in different groups. This
ensures all members of a group share the same sort key as their leader,
keeping groups together and guaranteeing leaders are opened before their
members.
Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Cc; linux-arm-kernel@lists.infradead.org
---
tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index ab40d85fb1259..3a423ca31d8d3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
+ const struct evsel *lhs_leader = evsel__leader(lhs);
+ const struct evsel *rhs_leader = evsel__leader(rhs);
- if (evsel__leader(lhs) == evsel__leader(rhs)) {
+ if (lhs_leader == rhs_leader) {
/* Within the same group, respect the original order. */
return lhs_core->idx - rhs_core->idx;
}
+ /*
+ * Compare using leader's attributes so that all members of a group
+ * stay together. This ensures leaders are opened before their members.
+ */
+
/* Sort default metrics evsels first, and default show events before those. */
- if (lhs->default_metricgroup != rhs->default_metricgroup)
- return lhs->default_metricgroup ? -1 : 1;
+ if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
+ return lhs_leader->default_metricgroup ? -1 : 1;
- if (lhs->default_show_events != rhs->default_show_events)
- return lhs->default_show_events ? -1 : 1;
+ if (lhs_leader->default_show_events != rhs_leader->default_show_events)
+ return lhs_leader->default_show_events ? -1 : 1;
/* Sort by PMU type (prefers legacy types first). */
- if (lhs->pmu != rhs->pmu)
- return lhs->pmu->type - rhs->pmu->type;
+ if (lhs_leader->pmu != rhs_leader->pmu)
+ return lhs_leader->pmu->type - rhs_leader->pmu->type;
- /* Sort by name. */
- return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
+ /* Sort by leader's name. */
+ return strcmp(evsel__name((struct evsel *)lhs_leader),
+ evsel__name((struct evsel *)rhs_leader));
}
/*
---
base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
change-id: 20260205-perf_stat-a0a2a37e21c5
Best regards,
--
Breno Leitao <leitao@debian.org>
On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
>
> Perf stat is crashing on arm64 hosts with the following issue:
>
> # make -C tools/perf DEBUG=1
> # perf stat sleep 1
> perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> [1] 1220794 IOT instruction (core dumped) ./perf stat
>
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.
Hi, sorry for the issue. I can see what you're saying but why is this
an arm64 issue? The legacy Default metrics are common to all
architectures:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
>
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.
Which metric is this?
> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.
This makes sense but I'm not understanding why this problem wasn't
seen previously. I'm guessing that in a metric like
backend_cycles_idle:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next#n63
```
"BriefDescription": "Backend stalls per cycle",
"MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
"MetricGroup": "Default",
"MetricName": "backend_cycles_idle",
"MetricThreshold": "backend_cycles_idle > 0.2",
"DefaultShowEvents": "1"
```
The PMUs for cpu-cycles and stalled-cycles differ? This may mean we
also need to be smarting in determining PMUs for legacy events.
It'd be interesting to see what events are coming from the kernel, e.g.:
```
$ ls /sys/bus/event_source/devices/*/events
/sys/bus/event_source/devices/cpu_atom/events:
branch-instructions cache-misses instructions ref-cycles
topdown-fe-bound
branch-misses cache-references mem-loads topdown-bad-spec
topdown-retiring
bus-cycles cpu-cycles mem-stores topdown-be-bound
...
```
and the cpuid to match it up with the json.
```
$ perf stat -v sleep 1 2>&1 |head -1
Using CPUID GenuineIntel-6-B7-1
$ ./tools/perf/pmu-events/models.py x86 GenuineIntel-6-B7-1
tools/perf/pmu-events/arch/
alderlake
```
this information is in the verbose output too:
```
$ perf stat -vv sleep 1
...
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x2 (PERF_COUNT_SW_PAGE_FAULTS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 608809 cpu -1 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0xa00000001
(cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 608809 cpu -1 group_fd -1 flags 0x8 = 8
...
```
Thanks,
Ian
> Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Cc; linux-arm-kernel@lists.infradead.org
> ---
> tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index ab40d85fb1259..3a423ca31d8d3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
> const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
> const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
> const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> + const struct evsel *lhs_leader = evsel__leader(lhs);
> + const struct evsel *rhs_leader = evsel__leader(rhs);
>
> - if (evsel__leader(lhs) == evsel__leader(rhs)) {
> + if (lhs_leader == rhs_leader) {
> /* Within the same group, respect the original order. */
> return lhs_core->idx - rhs_core->idx;
> }
>
> + /*
> + * Compare using leader's attributes so that all members of a group
> + * stay together. This ensures leaders are opened before their members.
> + */
> +
> /* Sort default metrics evsels first, and default show events before those. */
> - if (lhs->default_metricgroup != rhs->default_metricgroup)
> - return lhs->default_metricgroup ? -1 : 1;
> + if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
> + return lhs_leader->default_metricgroup ? -1 : 1;
>
> - if (lhs->default_show_events != rhs->default_show_events)
> - return lhs->default_show_events ? -1 : 1;
> + if (lhs_leader->default_show_events != rhs_leader->default_show_events)
> + return lhs_leader->default_show_events ? -1 : 1;
>
> /* Sort by PMU type (prefers legacy types first). */
> - if (lhs->pmu != rhs->pmu)
> - return lhs->pmu->type - rhs->pmu->type;
> + if (lhs_leader->pmu != rhs_leader->pmu)
> + return lhs_leader->pmu->type - rhs_leader->pmu->type;
>
> - /* Sort by name. */
> - return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
> + /* Sort by leader's name. */
> + return strcmp(evsel__name((struct evsel *)lhs_leader),
> + evsel__name((struct evsel *)rhs_leader));
> }
>
> /*
>
> ---
> base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
> change-id: 20260205-perf_stat-a0a2a37e21c5
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
Hello Ian, thanks for the quick reply!
On Thu, Feb 05, 2026 at 08:59:07AM -0800, Ian Rogers wrote:
> On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > Perf stat is crashing on arm64 hosts with the following issue:
> >
> > # make -C tools/perf DEBUG=1
> > # perf stat sleep 1
> > perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> > [1] 1220794 IOT instruction (core dumped) ./perf stat
> >
> > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > Sort default events/metrics") compares events based on their individual
> > properties. This can cause events from different groups to be
> > interleaved, resulting in group members appearing before their leaders
> > in the sorted evlist.
>
> Hi, sorry for the issue. I can see what you're saying but why is this
> an arm64 issue?
Sorry, It's not ARM64-specific - the bug is in the generic sort code.
It just happens to manifest on ARM64.
> The legacy Default metrics are common to all
> architectures:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
>
> > When the iterator opens events in list order, a group member may be
> > processed before its leader has been opened.
> >
> > For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> > could be sorted before its leader, causing the crash when CPU_CYCLES
> > tries to get its group fd from the not-yet-opened leader.
>
> Which metric is this?
These are ARM neoverse metrics, they can be found in
tools/perf/pmu-events/arch/arm64/arm/neoverse-n*
> > Fix this by comparing events based on their leader's attributes instead
> > of their own attributes when the events are in different groups. This
> > ensures all members of a group share the same sort key as their leader,
> > keeping groups together and guaranteeing leaders are opened before their
> > members.
>
> This makes sense but I'm not understanding why this problem wasn't
> seen previously. I'm guessing that in a metric like
> backend_cycles_idle:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next#n63
> ```
> "BriefDescription": "Backend stalls per cycle",
> "MetricExpr": "stalled\\-cycles\\-backend / cpu\\-cycles",
> "MetricGroup": "Default",
> "MetricName": "backend_cycles_idle",
> "MetricThreshold": "backend_cycles_idle > 0.2",
> "DefaultShowEvents": "1"
> ```
I was able to limit this to the following json:
[
{
"ArchStdEvent": "backend_bound",
"MetricExpr": "(100 * ((STALL_SLOT_BACKEND / (CPU_CYCLES * #slots)) - ((BR_MIS_PRED * 3) / CPU_CYCLES)))"
},
{
"ArchStdEvent": "frontend_bound",
"MetricExpr": "(100 * (((STALL_SLOT_FRONTEND) / (CPU_CYCLES * #slots)) - (BR_MIS_PRED / CPU_CYCLES)))"
}
]
and then
# ./tools/perf/perf stat -v sleep 0.01
Using CPUID 0x00000000410fd4f0
metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound
metric expr 100 * (STALL_SLOT_FRONTEND / (CPU_CYCLES * #slots) - BR_MIS_PRED / CPU_CYCLES) for frontend_bound
metric expr (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) / (duration_time * 1e9) for CPUs_utilized
metric expr stalled\-cycles\-backend / cpu\-cycles for backend_cycles_idle
metric expr stalled\-cycles\-backend / cpu\-cycles for backend_cycles_idle
metric expr branches / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for branch_frequency
metric expr branch\-misses / branches for branch_miss_rate
metric expr branch\-misses / branches for branch_miss_rate
metric expr software@context\-switches\,name\=context\-switches@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for cs_per_second
metric expr cpu\-cycles / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for cycles_frequency
metric expr stalled\-cycles\-frontend / cpu\-cycles for frontend_cycles_idle
metric expr stalled\-cycles\-frontend / cpu\-cycles for frontend_cycles_idle
metric expr instructions / cpu\-cycles for insn_per_cycle
metric expr instructions / cpu\-cycles for insn_per_cycle
metric expr software@cpu\-migrations\,name\=cpu\-migrations@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for migrations_per_second
metric expr software@page\-faults\,name\=page\-faults@ * 1e9 / (software@cpu\-clock\,name\=cpu\-clock@ if #target_cpu else software@task\-clock\,name\=task\-clock@) for page_faults_per_second
metric expr max(stalled\-cycles\-frontend, stalled\-cycles\-backend) / instructions for stalled_cycles_per_instruction
hwmon_pmu: failure to open '/sys/class/hwmon/hwmon4/name'
hwmon_pmu: failure to open '/sys/class/hwmon/hwmon5/name'
hwmon_pmu: failure to open '/sys/class/hwmon/hwmon3/name'
found event software@context-switches,name=context-switches@
found event duration_time
found event software@page-faults,name=page-faults@
found event software@task-clock,name=task-clock@
found event cpu-cycles
found event branches
found event software@cpu-migrations,name=cpu-migrations@
Parsing metric events 'software/context-switches,name=context-switches,metric-id=software!3context!1switches!0name!2context!1switches!3/,software/page-faults,name=page-faults,metric-id=software!3page!1faults!0name!2page!1faults!3/,software/task-clock,name=task-clock,metric-id=software!3task!1clock!0name!2task!1clock!3/,cpu-cycles/metric-id=cpu!1cycles/,branches/metric-id=branches/,software/cpu-migrations,name=cpu-migrations,metric-id=software!3cpu!1migrations!0name!2cpu!1migrations!3/,duration_time'
cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
branches -> armv8_pmuv3_0/metric-id=branches,branches/
duration_time -> tool/duration_time/
found event STALL_SLOT_FRONTEND
found event duration_time
found event BR_MIS_PRED
found event CPU_CYCLES
Parsing metric events '{STALL_SLOT_FRONTEND/metric-id=STALL_SLOT_FRONTEND/,BR_MIS_PRED/metric-id=BR_MIS_PRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W,duration_time'
STALL_SLOT_FRONTEND -> armv8_pmuv3_0/metric-id=STALL_SLOT_FRONTEND,STALL_SLOT_FRONTEND/
BR_MIS_PRED -> armv8_pmuv3_0/metric-id=BR_MIS_PRED,BR_MIS_PRED/
CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/
duration_time -> tool/duration_time/
Matched metric-id STALL_SLOT_FRONTEND to STALL_SLOT_FRONTEND
Matched metric-id BR_MIS_PRED to BR_MIS_PRED
Matched metric-id CPU_CYCLES to CPU_CYCLES
Matched metric-id duration_time to duration_time
found event STALL_SLOT_BACKEND
found event duration_time
found event BR_MIS_PRED
found event CPU_CYCLES
Parsing metric events '{STALL_SLOT_BACKEND/metric-id=STALL_SLOT_BACKEND/,BR_MIS_PRED/metric-id=BR_MIS_PRED/,CPU_CYCLES/metric-id=CPU_CYCLES/}:W,duration_time'
STALL_SLOT_BACKEND -> armv8_pmuv3_0/metric-id=STALL_SLOT_BACKEND,STALL_SLOT_BACKEND/
BR_MIS_PRED -> armv8_pmuv3_0/metric-id=BR_MIS_PRED,BR_MIS_PRED/
CPU_CYCLES -> armv8_pmuv3_0/metric-id=CPU_CYCLES,CPU_CYCLES/
duration_time -> tool/duration_time/
Matched metric-id STALL_SLOT_BACKEND to STALL_SLOT_BACKEND
Matched metric-id BR_MIS_PRED to BR_MIS_PRED
Matched metric-id CPU_CYCLES to CPU_CYCLES
Matched metric-id duration_time to duration_time
found event duration_time
found event stalled-cycles-backend
found event instructions
found event stalled-cycles-frontend
Parsing metric events '{stalled-cycles-backend/metric-id=stalled!1cycles!1backend/,instructions/metric-id=instructions/,stalled-cycles-frontend/metric-id=stalled!1cycles!1frontend/}:W,duration_time'
stalled-cycles-backend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1backend,stalled-cycles-backend/
instructions -> armv8_pmuv3_0/metric-id=instructions,instructions/
stalled-cycles-frontend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1frontend,stalled-cycles-frontend/
duration_time -> tool/duration_time/
Matched metric-id stalled-cycles-backend to stalled-cycles-backend
Matched metric-id instructions to instructions
Matched metric-id stalled-cycles-frontend to stalled-cycles-frontend
Matched metric-id duration_time to duration_time
Matched metric-id software@page-faults,name=page-faults@ to page-faults
Matched metric-id software@task-clock,name=task-clock@ to task-clock
Matched metric-id software@task-clock,name=task-clock@ to task-clock
Matched metric-id software@cpu-migrations,name=cpu-migrations@ to cpu-migrations
found event duration_time
found event cpu-cycles
found event instructions
Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,instructions/metric-id=instructions/}:W,duration_time'
cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
instructions -> armv8_pmuv3_0/metric-id=instructions,instructions/
duration_time -> tool/duration_time/
Matched metric-id cpu-cycles to cpu-cycles
Matched metric-id instructions to instructions
Matched metric-id duration_time to duration_time
found event duration_time
found event cpu-cycles
found event stalled-cycles-frontend
Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,stalled-cycles-frontend/metric-id=stalled!1cycles!1frontend/}:W,duration_time'
cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
stalled-cycles-frontend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1frontend,stalled-cycles-frontend/
duration_time -> tool/duration_time/
Matched metric-id cpu-cycles to cpu-cycles
Matched metric-id stalled-cycles-frontend to stalled-cycles-frontend
Matched metric-id duration_time to duration_time
Matched metric-id software@task-clock,name=task-clock@ to task-clock
Matched metric-id cpu-cycles to cpu-cycles
Matched metric-id software@context-switches,name=context-switches@ to context-switches
Matched metric-id software@task-clock,name=task-clock@ to task-clock
found event duration_time
found event branch-misses
found event branches
Parsing metric events '{branch-misses/metric-id=branch!1misses/,branches/metric-id=branches/}:W,duration_time'
branch-misses -> armv8_pmuv3_0/metric-id=branch!1misses,branch-misses/
branches -> armv8_pmuv3_0/metric-id=branches,branches/
duration_time -> tool/duration_time/
Matched metric-id branch-misses to branch-misses
Matched metric-id branches to branches
Matched metric-id duration_time to duration_time
Matched metric-id software@task-clock,name=task-clock@ to task-clock
Matched metric-id branches to branches
found event duration_time
found event cpu-cycles
found event stalled-cycles-backend
Parsing metric events '{cpu-cycles/metric-id=cpu!1cycles/,stalled-cycles-backend/metric-id=stalled!1cycles!1backend/}:W,duration_time'
cpu-cycles -> armv8_pmuv3_0/metric-id=cpu!1cycles,cpu-cycles/
stalled-cycles-backend -> armv8_pmuv3_0/metric-id=stalled!1cycles!1backend,stalled-cycles-backend/
duration_time -> tool/duration_time/
Matched metric-id cpu-cycles to cpu-cycles
Matched metric-id stalled-cycles-backend to stalled-cycles-backend
Matched metric-id duration_time to duration_time
Matched metric-id software@task-clock,name=task-clock@ to task-clock
Matched metric-id duration_time to duration_time
copying metric event for cgroup 'root': context-switches (idx=0)
copying metric event for cgroup 'root': page-faults (idx=1)
copying metric event for cgroup 'root': task-clock (idx=2)
copying metric event for cgroup 'root': cpu-cycles (idx=3)
copying metric event for cgroup 'root': branches (idx=4)
copying metric event for cgroup 'root': cpu-migrations (idx=5)
copying metric event for cgroup 'root': STALL_SLOT_FRONTEND (idx=7)
copying metric event for cgroup 'root': stalled-cycles-backend (idx=29)
copying metric event for cgroup 'root': STALL_SLOT_BACKEND (idx=11)
copying metric event for cgroup 'root': stalled-cycles-backend (idx=15)
copying metric event for cgroup 'root': instructions (idx=20)
copying metric event for cgroup 'root': stalled-cycles-frontend (idx=23)
copying metric event for cgroup 'root': branch-misses (idx=25)
copying metric event for cgroup 'root': context-switches (idx=6)
copying metric event for cgroup 'root': page-faults (idx=8)
copying metric event for cgroup 'root': task-clock (idx=9)
copying metric event for cgroup 'root': cpu-cycles (idx=13)
copying metric event for cgroup 'root': branches (idx=12)
copying metric event for cgroup 'root': cpu-migrations (idx=7)
copying metric event for cgroup 'root': STALL_SLOT_FRONTEND (idx=25)
copying metric event for cgroup 'root': stalled-cycles-backend (idx=19)
copying metric event for cgroup 'root': STALL_SLOT_BACKEND (idx=29)
copying metric event for cgroup 'root': stalled-cycles-backend (idx=20)
copying metric event for cgroup 'root': instructions (idx=15)
copying metric event for cgroup 'root': stalled-cycles-frontend (idx=17)
copying metric event for cgroup 'root': branch-misses (idx=10)
Control descriptor is not initialized
perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
[1] 832866 IOT instruction (core dumped) ./tools/perf/perf stat -v sleep 0.01
> The PMUs for cpu-cycles and stalled-cycles differ? This may mean we
> also need to be smarting in determining PMUs for legacy events.
>
> It'd be interesting to see what events are coming from the kernel, e.g.:
> ```
> $ ls /sys/bus/event_source/devices/*/events
# ls /sys/bus/event_source/devices/*/events
/sys/bus/event_source/devices/armv8_pmuv3_0/events:
br_mis_pred cti_trigout7 l1d_tlb_refill l2d_tlb_refill mem_access_checked_wr stall_backend_mem
br_mis_pred_retired dtlb_walk l1i_cache l3d_cache memory_error stall_frontend
br_pred exc_return l1i_cache_lmiss l3d_cache_allocate op_retired stall_slot
br_retired exc_taken l1i_cache_refill l3d_cache_lmiss_rd op_spec stall_slot_backend
bus_access inst_retired l1i_tlb l3d_cache_refill remote_access stall_slot_frontend
bus_cycles inst_spec l1i_tlb_refill ld_align_lat sample_collision trb_wrap
cid_write_retired itlb_walk l2d_cache ldst_align_lat sample_feed trcextout0
cnt_cycles l1d_cache l2d_cache_allocate ll_cache_miss_rd sample_filtrate trcextout1
cpu_cycles l1d_cache_lmiss_rd l2d_cache_lmiss_rd ll_cache_rd sample_pop trcextout2
cti_trigout4 l1d_cache_refill l2d_cache_refill mem_access st_align_lat trcextout3
cti_trigout5 l1d_cache_wb l2d_cache_wb mem_access_checked stall ttbr_write_retired
cti_trigout6 l1d_tlb l2d_tlb mem_access_checked_rd stall_backend
/sys/bus/event_source/devices/cs_etm/events:
autofdo
/sys/bus/event_source/devices/nvidia_cnvlink_pmu_0/events:
cycles rd_bytes_rem rd_cum_outs_rem rd_req_rem total_bytes_rem total_req_rem wr_bytes_rem wr_req_rem
rd_bytes_loc rd_cum_outs_loc rd_req_loc total_bytes_loc total_req_loc wr_bytes_loc wr_req_loc
/sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_0/events:
cycles rd_bytes_rem rd_cum_outs_rem rd_req_rem total_bytes_rem total_req_rem wr_bytes_rem wr_req_rem
rd_bytes_loc rd_cum_outs_loc rd_req_loc total_bytes_loc total_req_loc wr_bytes_loc wr_req_loc
/sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_0/events:
cycles rd_bytes_rem rd_cum_outs_rem rd_req_rem total_bytes_rem total_req_rem wr_bytes_rem wr_req_rem
rd_bytes_loc rd_cum_outs_loc rd_req_loc total_bytes_loc total_req_loc wr_bytes_loc wr_req_loc
/sys/bus/event_source/devices/nvidia_pcie_pmu_0/events:
cycles rd_bytes_rem rd_cum_outs_rem rd_req_rem total_bytes_rem total_req_rem wr_bytes_rem wr_req_rem
rd_bytes_loc rd_cum_outs_loc rd_req_loc total_bytes_loc total_req_loc wr_bytes_loc wr_req_loc
/sys/bus/event_source/devices/nvidia_scf_pmu_0/events:
bus_cycles gmem_rd_data scf_cache socket_1_rd_access socket_2_wb_data
cmem_rd_access gmem_rd_outstanding scf_cache_allocate socket_1_rd_data socket_2_wr_access
cmem_rd_data gmem_wb_access scf_cache_refill socket_1_rd_outstanding socket_2_wr_data
cmem_rd_outstanding gmem_wb_data scf_cache_wb socket_1_wb_access socket_3_rd_access
cmem_wb_access gmem_wr_access socket_0_rd_access socket_1_wb_data socket_3_rd_data
cmem_wb_data gmem_wr_data socket_0_rd_data socket_1_wr_access socket_3_rd_outstanding
cmem_wr_access gmem_wr_total_bytes socket_0_rd_outstanding socket_1_wr_data socket_3_wb_access
cmem_wr_data remote_socket_rd_access socket_0_wb_access socket_2_rd_access socket_3_wb_data
cmem_wr_total_bytes remote_socket_rd_data socket_0_wb_data socket_2_rd_data socket_3_wr_access
cycles remote_socket_rd_outstanding socket_0_wr_access socket_2_rd_outstanding socket_3_wr_data
gmem_rd_access remote_socket_wr_total_bytes socket_0_wr_data socket_2_wb_access
/sys/bus/event_source/devices/smmuv3_pmcg_11002/events:
config_cache_miss config_struct_access cycles pcie_ats_trans_rq tlb_miss transaction trans_table_walk_access
/sys/bus/event_source/devices/smmuv3_pmcg_11042/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_11062/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_11082/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_110a2/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_12002/events:
config_cache_miss config_struct_access cycles pcie_ats_trans_rq tlb_miss transaction trans_table_walk_access
/sys/bus/event_source/devices/smmuv3_pmcg_12042/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_12062/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_12082/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_120a2/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_15002/events:
config_cache_miss config_struct_access cycles pcie_ats_trans_rq tlb_miss transaction trans_table_walk_access
/sys/bus/event_source/devices/smmuv3_pmcg_15042/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_15062/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_15082/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_150a2/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_16002/events:
config_cache_miss config_struct_access cycles pcie_ats_trans_rq tlb_miss transaction trans_table_walk_access
/sys/bus/event_source/devices/smmuv3_pmcg_16042/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_16062/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_5002/events:
config_cache_miss config_struct_access cycles pcie_ats_trans_rq tlb_miss transaction trans_table_walk_access
/sys/bus/event_source/devices/smmuv3_pmcg_5042/events:
cycles pcie_ats_trans_passed tlb_miss transaction
/sys/bus/event_source/devices/smmuv3_pmcg_5062/events:
cycles pcie_ats_trans_passed tlb_miss transaction
> ```
> and the cpuid to match it up with the json.
> ```
> $ perf stat -v sleep 1 2>&1 |head -1
# perf stat -v sleep 1 2>&1 |head -1
Using CPUID 0x00000000410fd4f0
> this information is in the verbose output too:
> ```
> $ perf stat -vv sleep 1
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 144
config 0x3 (PERF_COUNT_SW_CONTEXT_SWITCHES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 144
config 0x4 (PERF_COUNT_SW_CPU_MIGRATIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 144
config 0x2 (PERF_COUNT_SW_PAGE_FAULTS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 144
config 0x1 (PERF_COUNT_SW_TASK_CLOCK)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x5 (PERF_COUNT_HW_BRANCH_MISSES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 8
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 8 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 12
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x1 (PERF_COUNT_HW_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 12 flags 0x8 = 13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 14
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x7 (PERF_COUNT_HW_STALLED_CYCLES_FRONTEND)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 14 flags 0x8 = 15
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 16
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x8 (PERF_COUNT_HW_STALLED_CYCLES_BACKEND)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 16 flags 0x8 = 17
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x8 (PERF_COUNT_HW_STALLED_CYCLES_BACKEND)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 18
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x1 (PERF_COUNT_HW_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 18 flags 0x8 = 19
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 144
config 0x7 (PERF_COUNT_HW_STALLED_CYCLES_FRONTEND)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 18 flags 0x8 = 20
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 4294967294 (tool)
size 144
config 0x1 (duration_time)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x10 (br_mis_pred)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd -1 flags 0x8 = 21
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x3b (op_spec)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 21 flags 0x8 = 22
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x3f (stall_slot)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 21 flags 0x8 = 23
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x11 (cpu_cycles)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 21 flags 0x8 = 24
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x3a (op_retired)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 865887 cpu -1 group_fd 21 flags 0x8 = 25
------------------------------------------------------------
perf_event_attr:
type 10 (armv8_pmuv3_0)
size 144
config 0x11 (cpu_cycles)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
Thanks for your help,
--breno
Hi Ian,
On Thu, Feb 05, 2026 at 08:59:07AM -0800, Ian Rogers wrote:
> On Thu, Feb 5, 2026 at 3:46 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > Perf stat is crashing on arm64 hosts with the following issue:
> >
> > # make -C tools/perf DEBUG=1
> > # perf stat sleep 1
> > perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> > [1] 1220794 IOT instruction (core dumped) ./perf stat
> >
> > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > Sort default events/metrics") compares events based on their individual
> > properties. This can cause events from different groups to be
> > interleaved, resulting in group members appearing before their leaders
> > in the sorted evlist.
>
> Hi, sorry for the issue. I can see what you're saying but why is this
> an arm64 issue? The legacy Default metrics are common to all
> architectures:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
Since you are mentioning common metrics, I found the common metrics does
not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
but both don't work).
The latest perf will have no any output if the CPU type is missed in
json and rallback to common metrics. The failure path is:
add_default_events()
metricgroup__parse_groups()
pmu_metrics_table__find() => return NULL
In my case, pmu_metrics_table__find() always return NULL, as a result,
`perf stat sleep 1` directly bail out without any output.
I expect Breno's env might have the corresponding CPU json files, this
is possible different from my test machine.
Thanks,
Leo
On Thu, Feb 05, 2026 at 05:39:18PM +0000, Leo Yan wrote:
> > > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > > Sort default events/metrics") compares events based on their individual
> > > properties. This can cause events from different groups to be
> > > interleaved, resulting in group members appearing before their leaders
> > > in the sorted evlist.
> >
> > Hi, sorry for the issue. I can see what you're saying but why is this
> > an arm64 issue? The legacy Default metrics are common to all
> > architectures:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
>
> Since you are mentioning common metrics, I found the common metrics does
> not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
> but both don't work).
>
> The latest perf will have no any output if the CPU type is missed in
> json and rallback to common metrics. The failure path is:
>
> add_default_events()
> metricgroup__parse_groups()
> pmu_metrics_table__find() => return NULL
>
> In my case, pmu_metrics_table__find() always return NULL, as a result,
> `perf stat sleep 1` directly bail out without any output.
>
> I expect Breno's env might have the corresponding CPU json files, this
> is possible different from my test machine.
On my local env, I need a fix:
diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
index e4d00f6b2b5d..f74acc206856 100644
--- a/tools/perf/pmu-events/empty-pmu-events.c
+++ b/tools/perf/pmu-events/empty-pmu-events.c
@@ -3237,14 +3237,6 @@ const struct pmu_events_table *perf_pmu__default_core_events_table(void)
return NULL;
}
-const struct pmu_metrics_table *pmu_metrics_table__find(void)
-{
- struct perf_cpu cpu = {-1};
- const struct pmu_events_map *map = map_for_cpu(cpu);
-
- return map ? &map->metric_table : NULL;
-}
-
const struct pmu_metrics_table *pmu_metrics_table__default(void)
{
int i = 0;
@@ -3261,6 +3253,17 @@ const struct pmu_metrics_table *pmu_metrics_table__default(void)
return NULL;
}
+const struct pmu_metrics_table *pmu_metrics_table__find(void)
+{
+ struct perf_cpu cpu = {-1};
+ const struct pmu_events_map *map = map_for_cpu(cpu);
+
+ if (map)
+ return &map->metric_table;
+
+ return pmu_metrics_table__default();
+}
+
I have no deep understanding for jevents, seems to me, Breno's issue is
a different one from me. Please kindly confirm.
Thanks,
Leo
On Thu, Feb 5, 2026 at 9:52 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Thu, Feb 05, 2026 at 05:39:18PM +0000, Leo Yan wrote:
>
> > > > The sorting function introduced by commit a745c0831c15c ("perf stat:
> > > > Sort default events/metrics") compares events based on their individual
> > > > properties. This can cause events from different groups to be
> > > > interleaved, resulting in group members appearing before their leaders
> > > > in the sorted evlist.
> > >
> > > Hi, sorry for the issue. I can see what you're saying but why is this
> > > an arm64 issue? The legacy Default metrics are common to all
> > > architectures:
> > > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common/metrics.json?h=perf-tools-next
> >
> > Since you are mentioning common metrics, I found the common metrics does
> > not work on Arm64 platform (I built with NO_JEVENTS=1 or enabled jevnts
> > but both don't work).
> >
> > The latest perf will have no any output if the CPU type is missed in
> > json and rallback to common metrics. The failure path is:
> >
> > add_default_events()
> > metricgroup__parse_groups()
> > pmu_metrics_table__find() => return NULL
> >
The return is correct but the early return is wrong, the metric code
was updated to always consider the default table and skip a NULL
table:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next#n430
I'll send a patch for the early return.
> > In my case, pmu_metrics_table__find() always return NULL, as a result,
> > `perf stat sleep 1` directly bail out without any output.
> >
> > I expect Breno's env might have the corresponding CPU json files, this
> > is possible different from my test machine.
>
> On my local env, I need a fix:
>
> diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-events/empty-pmu-events.c
> index e4d00f6b2b5d..f74acc206856 100644
> --- a/tools/perf/pmu-events/empty-pmu-events.c
> +++ b/tools/perf/pmu-events/empty-pmu-events.c
> @@ -3237,14 +3237,6 @@ const struct pmu_events_table *perf_pmu__default_core_events_table(void)
> return NULL;
> }
>
> -const struct pmu_metrics_table *pmu_metrics_table__find(void)
> -{
> - struct perf_cpu cpu = {-1};
> - const struct pmu_events_map *map = map_for_cpu(cpu);
> -
> - return map ? &map->metric_table : NULL;
> -}
> -
> const struct pmu_metrics_table *pmu_metrics_table__default(void)
> {
> int i = 0;
> @@ -3261,6 +3253,17 @@ const struct pmu_metrics_table *pmu_metrics_table__default(void)
> return NULL;
> }
>
> +const struct pmu_metrics_table *pmu_metrics_table__find(void)
> +{
> + struct perf_cpu cpu = {-1};
> + const struct pmu_events_map *map = map_for_cpu(cpu);
> +
> + if (map)
> + return &map->metric_table;
> +
> + return pmu_metrics_table__default();
> +}
> +
>
> I have no deep understanding for jevents, seems to me, Breno's issue is
> a different one from me. Please kindly confirm.
I think it is a different issue, they have metrics while you don't.
Your report does highlight we're missing a NO_JEVENTS=1 build-test,
but the build is working for me. I'll send out two patches for these
issues.
Thanks,
Ian
> Thanks,
> Leo
On Thu, Feb 05, 2026 at 03:46:31AM -0800, Breno Leitao wrote:
> Perf stat is crashing on arm64 hosts with the following issue:
>
> # make -C tools/perf DEBUG=1
> # perf stat sleep 1
> perf: util/evsel.c:2034: get_group_fd: Assertion `!(!leader->core.fd)' failed.
> [1] 1220794 IOT instruction (core dumped) ./perf stat
>
> The sorting function introduced by commit a745c0831c15c ("perf stat:
> Sort default events/metrics") compares events based on their individual
> properties. This can cause events from different groups to be
> interleaved, resulting in group members appearing before their leaders
> in the sorted evlist.
>
> When the iterator opens events in list order, a group member may be
> processed before its leader has been opened.
>
> For example, CPU_CYCLES (idx=32) with leader STALL_SLOT_BACKEND (idx=37)
> could be sorted before its leader, causing the crash when CPU_CYCLES
> tries to get its group fd from the not-yet-opened leader.
>
> Fix this by comparing events based on their leader's attributes instead
> of their own attributes when the events are in different groups. This
> ensures all members of a group share the same sort key as their leader,
> keeping groups together and guaranteeing leaders are opened before their
> members.
>
> Reported-by: Denis Yaroshevskiy <dyaroshev@meta.com>
> Fixes: a745c0831c15c ("perf stat: Sort default events/metrics")
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Cc; linux-arm-kernel@lists.infradead.org
> ---
> tools/perf/builtin-stat.c | 26 +++++++++++++++++---------
> 1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index ab40d85fb1259..3a423ca31d8d3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1938,25 +1938,33 @@ static int default_evlist_evsel_cmp(void *priv __maybe_unused,
> const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
> const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
> const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> + const struct evsel *lhs_leader = evsel__leader(lhs);
> + const struct evsel *rhs_leader = evsel__leader(rhs);
>
> - if (evsel__leader(lhs) == evsel__leader(rhs)) {
> + if (lhs_leader == rhs_leader) {
> /* Within the same group, respect the original order. */
> return lhs_core->idx - rhs_core->idx;
> }
>
> + /*
> + * Compare using leader's attributes so that all members of a group
> + * stay together. This ensures leaders are opened before their members.
> + */
> +
> /* Sort default metrics evsels first, and default show events before those. */
> - if (lhs->default_metricgroup != rhs->default_metricgroup)
> - return lhs->default_metricgroup ? -1 : 1;
> + if (lhs_leader->default_metricgroup != rhs_leader->default_metricgroup)
> + return lhs_leader->default_metricgroup ? -1 : 1;
>
> - if (lhs->default_show_events != rhs->default_show_events)
> - return lhs->default_show_events ? -1 : 1;
> + if (lhs_leader->default_show_events != rhs_leader->default_show_events)
> + return lhs_leader->default_show_events ? -1 : 1;
>
> /* Sort by PMU type (prefers legacy types first). */
> - if (lhs->pmu != rhs->pmu)
> - return lhs->pmu->type - rhs->pmu->type;
> + if (lhs_leader->pmu != rhs_leader->pmu)
> + return lhs_leader->pmu->type - rhs_leader->pmu->type;
>
> - /* Sort by name. */
> - return strcmp(evsel__name((struct evsel *)lhs), evsel__name((struct evsel *)rhs));
> + /* Sort by leader's name. */
> + return strcmp(evsel__name((struct evsel *)lhs_leader),
> + evsel__name((struct evsel *)rhs_leader));
> }
>
> /*
>
> ---
> base-commit: 5fd0a1df5d05ad066e5618ccdd3d0fa6cb686c27
> change-id: 20260205-perf_stat-a0a2a37e21c5
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
Tested-by: Dmitry Ilvokhin <d@ilvokhin.com>
© 2016 - 2026 Red Hat, Inc.