[v4] perf stat fixes and improvements

[PATCH v4 07/10] perf tool_pmu: More accurately set the cpus for tool events

Posted by Ian Rogers 2 months, 3 weeks ago

The user and system time events can record on different CPUs, but for
all other events a single CPU map of just CPU 0 makes sense. In
parse-events detect a tool PMU and then pass the perf_event_attr so
that the tool_pmu can return CPUs specific for the event. This avoids
a CPU map of all online CPUs being used for events like
duration_time. Avoiding this avoids the evlist CPUs containing CPUs
for which duration_time just gives 0. Minimizing the evlist CPUs can
remove unnecessary sched_setaffinity syscalls that delay metric
calculations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/parse-events.c |  9 +++++++--
 tools/perf/util/tool_pmu.c     | 19 +++++++++++++++++++
 tools/perf/util/tool_pmu.h     |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 0c0dc20b1c13..7b2422ccb554 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -30,6 +30,7 @@
 #include "util/event.h"
 #include "util/bpf-filter.h"
 #include "util/stat.h"
+#include "util/tool_pmu.h"
 #include "util/util.h"
 #include "tracepoint.h"
 #include <api/fs/tracing_path.h>
@@ -227,8 +228,12 @@ __add_event(struct list_head *list, int *idx,
 	if (pmu) {
 		is_pmu_core = pmu->is_core;
 		pmu_cpus = perf_cpu_map__get(pmu->cpus);
-		if (perf_cpu_map__is_empty(pmu_cpus))
-			pmu_cpus = cpu_map__online();
+		if (perf_cpu_map__is_empty(pmu_cpus)) {
+			if (perf_pmu__is_tool(pmu))
+				pmu_cpus = tool_pmu__cpus(attr);
+			else
+				pmu_cpus = cpu_map__online();
+		}
 	} else {
 		is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
 			       attr->type == PERF_TYPE_HW_CACHE);
diff --git a/tools/perf/util/tool_pmu.c b/tools/perf/util/tool_pmu.c
index 6a9df3dc0e07..37c4eae0bef1 100644
--- a/tools/perf/util/tool_pmu.c
+++ b/tools/perf/util/tool_pmu.c
@@ -2,6 +2,7 @@
 #include "cgroup.h"
 #include "counts.h"
 #include "cputopo.h"
+#include "debug.h"
 #include "evsel.h"
 #include "pmu.h"
 #include "print-events.h"
@@ -13,6 +14,7 @@
 #include <api/fs/fs.h>
 #include <api/io.h>
 #include <internal/threadmap.h>
+#include <perf/cpumap.h>
 #include <perf/threadmap.h>
 #include <fcntl.h>
 #include <strings.h>
@@ -109,6 +111,23 @@ const char *evsel__tool_pmu_event_name(const struct evsel *evsel)
 	return tool_pmu__event_to_str(evsel->core.attr.config);
 }
 
+struct perf_cpu_map *tool_pmu__cpus(struct perf_event_attr *attr)
+{
+	static struct perf_cpu_map *cpu0_map;
+	enum tool_pmu_event event = (enum tool_pmu_event)attr->config;
+
+	if (event <= TOOL_PMU__EVENT_NONE || event >= TOOL_PMU__EVENT_MAX) {
+		pr_err("Invalid tool PMU event config %llx\n", attr->config);
+		return NULL;
+	}
+	if (event == TOOL_PMU__EVENT_USER_TIME || event == TOOL_PMU__EVENT_SYSTEM_TIME)
+		return cpu_map__online();
+
+	if (!cpu0_map)
+		cpu0_map = perf_cpu_map__new_int(0);
+	return perf_cpu_map__get(cpu0_map);
+}
+
 static bool read_until_char(struct io *io, char e)
 {
 	int c;
diff --git a/tools/perf/util/tool_pmu.h b/tools/perf/util/tool_pmu.h
index f1714001bc1d..ea343d1983d3 100644
--- a/tools/perf/util/tool_pmu.h
+++ b/tools/perf/util/tool_pmu.h
@@ -46,6 +46,7 @@ bool tool_pmu__read_event(enum tool_pmu_event ev,
 u64 tool_pmu__cpu_slots_per_cycle(void);
 
 bool perf_pmu__is_tool(const struct perf_pmu *pmu);
+struct perf_cpu_map *tool_pmu__cpus(struct perf_event_attr *attr);
 
 bool evsel__is_tool(const struct evsel *evsel);
 enum tool_pmu_event evsel__tool_event(const struct evsel *evsel);
-- 
2.51.2.1041.gc1ab5b90ca-goog

Re: [PATCH v4 07/10] perf tool_pmu: More accurately set the cpus for tool events

Posted by Andres Freund 5 days, 1 hour ago

Hi,

On 2025-11-13 10:05:13 -0800, Ian Rogers wrote:
> The user and system time events can record on different CPUs, but for
> all other events a single CPU map of just CPU 0 makes sense. In
> parse-events detect a tool PMU and then pass the perf_event_attr so
> that the tool_pmu can return CPUs specific for the event. This avoids
> a CPU map of all online CPUs being used for events like
> duration_time. Avoiding this avoids the evlist CPUs containing CPUs
> for which duration_time just gives 0. Minimizing the evlist CPUs can
> remove unnecessary sched_setaffinity syscalls that delay metric
> calculations.

I was just testing v6.19-rc* and noticed that
  perf stat -C $somecpu sleep 1
segfaults.

I bisected that down to this change (d8d8a0b3603a9a8fa207cf9e4f292e81dc5d1008).

$ git describe
v6.18-rc1-116-gd8d8a0b3603a9

$ gdb --args perf stat -C11 sleep 1
(gdb) r
Starting program: /home/andres/bin/bin/perf stat -C11 sleep 1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 843870]

 Performance counter stats for 'CPU(s) 11':

                16      context-switches                 #     16.0 cs/sec  cs_per_second
          1,001.66 msec cpu-clock
Program received signal SIGSEGV, Segmentation fault.
prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20,
    pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
85			aggr = &ps->aggr[is_tool_time ? tool_aggr_idx : aggr_idx];
(gdb) bt
#0  prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20,
    pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
#1  0x00005555557be0eb in generic_metric (config=<optimized out>, mexp=0x5555560a40f0, evsel=<optimized out>, aggr_idx=<optimized out>, out=0x7fffffff76c0)
    at util/stat-shadow.c:146
#2  perf_stat__print_shadow_stats_metricgroup (config=config@entry=0x55555603bd80 <stat_config>, evsel=<optimized out>, aggr_idx=0,
    num=num@entry=0x7fffffff7604, from=from@entry=0x0, out=0x7fffffff76c0) at util/stat-shadow.c:307
#3  0x00005555557be4c7 in perf_stat__print_shadow_stats (config=config@entry=0x55555603bd80 <stat_config>, evsel=evsel@entry=0x555556093e20,
    aggr_idx=aggr_idx@entry=0, out=out@entry=0x7fffffff76c0) at util/stat-shadow.c:325
#4  0x00005555557c03a3 in printout (config=<optimized out>, os=0x7fffffff78b0, uval=<optimized out>, run=1001660763, ena=<optimized out>,
    noise=<optimized out>, aggr_idx=0) at util/stat-display.c:874
#5  0x00005555557c1424 in print_counter_aggrdata (config=0x55555603bd80 <stat_config>, counter=0x555556093e20, aggr_idx=0, os=<optimized out>)
    at util/stat-display.c:1013
#6  0x00005555557c3051 in print_counter (config=<optimized out>, counter=<optimized out>, os=<optimized out>) at util/stat-display.c:1127
#7  print_counter (config=<optimized out>, counter=<optimized out>, os=<optimized out>) at util/stat-display.c:1117
#8  evlist__print_counters (evlist=<optimized out>, config=config@entry=0x55555603bd80 <stat_config>, _target=_target@entry=0x5555560412e0 <target>,
    ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffdb20) at util/stat-display.c:1600
#9  0x00005555555d0d27 in print_counters (ts=0x0, argc=2, argv=0x7fffffffdb20) at builtin-stat.c:1070
#10 print_counters (ts=0x0, argc=2, argv=0x7fffffffdb20) at builtin-stat.c:1062
#11 cmd_stat (argc=2, argv=0x7fffffffdb20) at builtin-stat.c:2949
#12 0x000055555562fee2 in run_builtin (p=p@entry=0x55555602df48 <commands+360>, argc=argc@entry=4, argv=argv@entry=0x7fffffffdb20) at perf.c:349
#13 0x00005555556301ce in handle_internal_command (argc=argc@entry=4, argv=argv@entry=0x7fffffffdb20) at perf.c:401
#14 0x00005555555a8d33 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:445
#15 main (argc=<optimized out>, argv=0x7fffffffdb20) at perf.c:553


(gdb) bt full
#0  prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20, pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
        val = <optimized out>
        source_count = 0
        tool_aggr_idx = 0
        is_tool_time = true
        ps = 0x0
        aggr = <optimized out>
        n = <optimized out>
        metric_events = <optimized out>
        metric_refs = 0x0
        i = 1
#1  0x00005555557be0eb in generic_metric (config=<optimized out>, mexp=0x5555560a40f0, evsel=<optimized out>, aggr_idx=<optimized out>, out=0x7fffffff76c0) at util/stat-shadow.c:146
        print_metric = 0x5555557c1b70 <print_metric_std>
        metric_name = 0x5555560ae550 "CPUs_utilized"
        metric_expr = 0x555555ea5bb5 "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)"
        metric_threshold = 0x0
        metric_unit = 0x555555ea5c4d "1CPUs"
        metric_events = 0x5555560ae490
        runtime = 0
        pctx = 0x5555560a7150
        ratio = 15.973479072960373
        scale = 1
        threshold = 4.9406564584124654e-324
        i = <optimized out>
        ctxp = 0x7fffffff78b0
        thresh = METRIC_THRESHOLD_UNKNOWN
#2  perf_stat__print_shadow_stats_metricgroup (config=config@entry=0x55555603bd80 <stat_config>, evsel=<optimized out>, aggr_idx=0, num=num@entry=0x7fffffff7604, from=from@entry=0x0, out=0x7fffffff76c0) at util/stat-shadow.c:307
        me = 0x5555560877d0
        mexp = 0x5555560a40f0
        ctxp = 0x7fffffff78b0
        header_printed = false
        name = 0x555555879b28 ""
        metric_events = <optimized out>
#3  0x00005555557be4c7 in perf_stat__print_shadow_stats (config=config@entry=0x55555603bd80 <stat_config>, evsel=evsel@entry=0x555556093e20, aggr_idx=aggr_idx@entry=0, out=out@entry=0x7fffffff76c0) at util/stat-shadow.c:325
        print_metric = 0x5555557c1b70 <print_metric_std>
        ctxp = 0x7fffffff78b0
        num = 1
#4  0x00005555557c03a3 in printout (config=<optimized out>, os=0x7fffffff78b0, uval=<optimized out>, run=1001660763, ena=<optimized out>, noise=<optimized out>, aggr_idx=0) at util/stat-display.c:874
        out = {ctx = 0x7fffffff78b0, print_metric = 0x5555557c1b70 <print_metric_std>, new_line = 0x5555557be5a0 <new_line_std>, print_metricgroup_header = 0x5555557bea60 <print_metricgroup_header_std>, force_header = false}
        pm = <optimized out>
        nl = <optimized out>
        pmh = <optimized out>
        ok = <optimized out>
        counter = 0x555556093e20
#5  0x00005555557c1424 in print_counter_aggrdata (config=0x55555603bd80 <stat_config>, counter=0x555556093e20, aggr_idx=0, os=<optimized out>) at util/stat-display.c:1013
        output = 0x7ffff69f24e0 <_IO_2_1_stderr_>
        ena = 1001660763
        run = <optimized out>
        val = <optimized out>
        uval = <optimized out>
        ps = <optimized out>
        aggr = <optimized out>
        id = {thread_idx = -1, node = -1, socket = -1, die = -1, cluster = -1, cache_lvl = -1, cache = -1, core = -1, cpu = {cpu = 0}}
        avg = <optimized out>
        metric_only = false
...

whereas on

$ git describe
v6.18-rc1-115-gd702c0f4af6e0

$ perf stat -C11 sleep 1

 Performance counter stats for 'CPU(s) 11':

                 8      context-switches                 #      8.0 cs/sec  cs_per_second
          1,001.48 msec cpu-clock                        #      1.0 CPUs  CPUs_utilized
                 0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
                 3      page-faults                      #      3.0 faults/sec  page_faults_per_second
           277,825      branch-misses                    #      0.3 %  branch_miss_rate         (33.11%)
       102,546,153      branches                         #    102.4 M/sec  branch_frequency     (33.51%)
     2,501,242,351      cpu-cycles                       #      2.5 GHz  cycles_frequency       (33.55%)
       317,348,809      instructions                     #      0.1 instructions  insn_per_cycle  (33.55%)

       1.001449562 seconds time elapsed

it works without a problem.

Greetings,

Andres Freund

Re: [PATCH v4 07/10] perf tool_pmu: More accurately set the cpus for tool events

Posted by Ian Rogers 4 days, 20 hours ago

On Tue, Feb 3, 2026 at 9:37 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-11-13 10:05:13 -0800, Ian Rogers wrote:
> > The user and system time events can record on different CPUs, but for
> > all other events a single CPU map of just CPU 0 makes sense. In
> > parse-events detect a tool PMU and then pass the perf_event_attr so
> > that the tool_pmu can return CPUs specific for the event. This avoids
> > a CPU map of all online CPUs being used for events like
> > duration_time. Avoiding this avoids the evlist CPUs containing CPUs
> > for which duration_time just gives 0. Minimizing the evlist CPUs can
> > remove unnecessary sched_setaffinity syscalls that delay metric
> > calculations.
>
> I was just testing v6.19-rc* and noticed that
>   perf stat -C $somecpu sleep 1
> segfaults.
>
> I bisected that down to this change (d8d8a0b3603a9a8fa207cf9e4f292e81dc5d1008).
>
> $ git describe
> v6.18-rc1-116-gd8d8a0b3603a9
>
> $ gdb --args perf stat -C11 sleep 1
> (gdb) r
> Starting program: /home/andres/bin/bin/perf stat -C11 sleep 1
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
> [Detaching after fork from child process 843870]
>
>  Performance counter stats for 'CPU(s) 11':
>
>                 16      context-switches                 #     16.0 cs/sec  cs_per_second
>           1,001.66 msec cpu-clock
> Program received signal SIGSEGV, Segmentation fault.
> prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20,
>     pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
> 85                      aggr = &ps->aggr[is_tool_time ? tool_aggr_idx : aggr_idx];
> (gdb) bt
> #0  prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20,
>     pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
> #1  0x00005555557be0eb in generic_metric (config=<optimized out>, mexp=0x5555560a40f0, evsel=<optimized out>, aggr_idx=<optimized out>, out=0x7fffffff76c0)
>     at util/stat-shadow.c:146
> #2  perf_stat__print_shadow_stats_metricgroup (config=config@entry=0x55555603bd80 <stat_config>, evsel=<optimized out>, aggr_idx=0,
>     num=num@entry=0x7fffffff7604, from=from@entry=0x0, out=0x7fffffff76c0) at util/stat-shadow.c:307
> #3  0x00005555557be4c7 in perf_stat__print_shadow_stats (config=config@entry=0x55555603bd80 <stat_config>, evsel=evsel@entry=0x555556093e20,
>     aggr_idx=aggr_idx@entry=0, out=out@entry=0x7fffffff76c0) at util/stat-shadow.c:325
> #4  0x00005555557c03a3 in printout (config=<optimized out>, os=0x7fffffff78b0, uval=<optimized out>, run=1001660763, ena=<optimized out>,
>     noise=<optimized out>, aggr_idx=0) at util/stat-display.c:874
> #5  0x00005555557c1424 in print_counter_aggrdata (config=0x55555603bd80 <stat_config>, counter=0x555556093e20, aggr_idx=0, os=<optimized out>)
>     at util/stat-display.c:1013
> #6  0x00005555557c3051 in print_counter (config=<optimized out>, counter=<optimized out>, os=<optimized out>) at util/stat-display.c:1127
> #7  print_counter (config=<optimized out>, counter=<optimized out>, os=<optimized out>) at util/stat-display.c:1117
> #8  evlist__print_counters (evlist=<optimized out>, config=config@entry=0x55555603bd80 <stat_config>, _target=_target@entry=0x5555560412e0 <target>,
>     ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffdb20) at util/stat-display.c:1600
> #9  0x00005555555d0d27 in print_counters (ts=0x0, argc=2, argv=0x7fffffffdb20) at builtin-stat.c:1070
> #10 print_counters (ts=0x0, argc=2, argv=0x7fffffffdb20) at builtin-stat.c:1062
> #11 cmd_stat (argc=2, argv=0x7fffffffdb20) at builtin-stat.c:2949
> #12 0x000055555562fee2 in run_builtin (p=p@entry=0x55555602df48 <commands+360>, argc=argc@entry=4, argv=argv@entry=0x7fffffffdb20) at perf.c:349
> #13 0x00005555556301ce in handle_internal_command (argc=argc@entry=4, argv=argv@entry=0x7fffffffdb20) at perf.c:401
> #14 0x00005555555a8d33 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:445
> #15 main (argc=<optimized out>, argv=0x7fffffffdb20) at perf.c:553
>
>
> (gdb) bt full
> #0  prepare_metric (config=config@entry=0x55555603bd80 <stat_config>, mexp=mexp@entry=0x5555560a40f0, evsel=evsel@entry=0x555556093e20, pctx=pctx@entry=0x5555560a7150, aggr_idx=aggr_idx@entry=0) at util/stat-shadow.c:85
>         val = <optimized out>
>         source_count = 0
>         tool_aggr_idx = 0
>         is_tool_time = true
>         ps = 0x0
>         aggr = <optimized out>
>         n = <optimized out>
>         metric_events = <optimized out>
>         metric_refs = 0x0
>         i = 1
> #1  0x00005555557be0eb in generic_metric (config=<optimized out>, mexp=0x5555560a40f0, evsel=<optimized out>, aggr_idx=<optimized out>, out=0x7fffffff76c0) at util/stat-shadow.c:146
>         print_metric = 0x5555557c1b70 <print_metric_std>
>         metric_name = 0x5555560ae550 "CPUs_utilized"
>         metric_expr = 0x555555ea5bb5 "(software@cpu\\-clock\\,name\\=cpu\\-clock@ if #target_cpu else software@task\\-clock\\,name\\=task\\-clock@) / (duration_time * 1e9)"
>         metric_threshold = 0x0
>         metric_unit = 0x555555ea5c4d "1CPUs"
>         metric_events = 0x5555560ae490
>         runtime = 0
>         pctx = 0x5555560a7150
>         ratio = 15.973479072960373
>         scale = 1
>         threshold = 4.9406564584124654e-324
>         i = <optimized out>
>         ctxp = 0x7fffffff78b0
>         thresh = METRIC_THRESHOLD_UNKNOWN
> #2  perf_stat__print_shadow_stats_metricgroup (config=config@entry=0x55555603bd80 <stat_config>, evsel=<optimized out>, aggr_idx=0, num=num@entry=0x7fffffff7604, from=from@entry=0x0, out=0x7fffffff76c0) at util/stat-shadow.c:307
>         me = 0x5555560877d0
>         mexp = 0x5555560a40f0
>         ctxp = 0x7fffffff78b0
>         header_printed = false
>         name = 0x555555879b28 ""
>         metric_events = <optimized out>
> #3  0x00005555557be4c7 in perf_stat__print_shadow_stats (config=config@entry=0x55555603bd80 <stat_config>, evsel=evsel@entry=0x555556093e20, aggr_idx=aggr_idx@entry=0, out=out@entry=0x7fffffff76c0) at util/stat-shadow.c:325
>         print_metric = 0x5555557c1b70 <print_metric_std>
>         ctxp = 0x7fffffff78b0
>         num = 1
> #4  0x00005555557c03a3 in printout (config=<optimized out>, os=0x7fffffff78b0, uval=<optimized out>, run=1001660763, ena=<optimized out>, noise=<optimized out>, aggr_idx=0) at util/stat-display.c:874
>         out = {ctx = 0x7fffffff78b0, print_metric = 0x5555557c1b70 <print_metric_std>, new_line = 0x5555557be5a0 <new_line_std>, print_metricgroup_header = 0x5555557bea60 <print_metricgroup_header_std>, force_header = false}
>         pm = <optimized out>
>         nl = <optimized out>
>         pmh = <optimized out>
>         ok = <optimized out>
>         counter = 0x555556093e20
> #5  0x00005555557c1424 in print_counter_aggrdata (config=0x55555603bd80 <stat_config>, counter=0x555556093e20, aggr_idx=0, os=<optimized out>) at util/stat-display.c:1013
>         output = 0x7ffff69f24e0 <_IO_2_1_stderr_>
>         ena = 1001660763
>         run = <optimized out>
>         val = <optimized out>
>         uval = <optimized out>
>         ps = <optimized out>
>         aggr = <optimized out>
>         id = {thread_idx = -1, node = -1, socket = -1, die = -1, cluster = -1, cache_lvl = -1, cache = -1, core = -1, cpu = {cpu = 0}}
>         avg = <optimized out>
>         metric_only = false
> ...
>
> whereas on
>
> $ git describe
> v6.18-rc1-115-gd702c0f4af6e0
>
> $ perf stat -C11 sleep 1
>
>  Performance counter stats for 'CPU(s) 11':
>
>                  8      context-switches                 #      8.0 cs/sec  cs_per_second
>           1,001.48 msec cpu-clock                        #      1.0 CPUs  CPUs_utilized
>                  0      cpu-migrations                   #      0.0 migrations/sec  migrations_per_second
>                  3      page-faults                      #      3.0 faults/sec  page_faults_per_second
>            277,825      branch-misses                    #      0.3 %  branch_miss_rate         (33.11%)
>        102,546,153      branches                         #    102.4 M/sec  branch_frequency     (33.51%)
>      2,501,242,351      cpu-cycles                       #      2.5 GHz  cycles_frequency       (33.55%)
>        317,348,809      instructions                     #      0.1 instructions  insn_per_cycle  (33.55%)
>
>        1.001449562 seconds time elapsed
>
> it works without a problem.

Thanks for the bug report! There were changes in v6.19 to change how
metrics were computed hence the fault in the metric computation. The
bisected patch is at fault, it aims to compute the CPU maps for tool
events in a way to avoid unnecessary sched setaffinity calls. By
having a smaller CPU map it set up a situation with user CPUs where
the tool events could have no CPUs, removed from the list of events
and the later seg fault occurred. I sent out a new patch series:
https://lore.kernel.org/linux-perf-users/20260203225129.4077140-1-irogers@google.com/
that has a revert of this patch, a fix to correctly avoid the segfault
in the prepare_metric function, and a new patch to properly reduce the
CPU maps of the tool events. The remaining affinity reduction patches
are still part of the series and I believe ready to land.

Thanks,
Ian

> Greetings,
>
> Andres Freund

Re: [PATCH v4 07/10] perf tool_pmu: More accurately set the cpus for tool events

Posted by Andres Freund 4 days, 19 hours ago

Hi,

On 2026-02-03 15:05:39 -0800, Ian Rogers wrote:
> Thanks for the bug report! There were changes in v6.19 to change how
> metrics were computed hence the fault in the metric computation. The
> bisected patch is at fault, it aims to compute the CPU maps for tool
> events in a way to avoid unnecessary sched setaffinity calls. By
> having a smaller CPU map it set up a situation with user CPUs where
> the tool events could have no CPUs, removed from the list of events
> and the later seg fault occurred.

Thanks for the quick reply!

> I sent out a new patch series:
> https://lore.kernel.org/linux-perf-users/20260203225129.4077140-1-irogers@google.com/
> that has a revert of this patch, a fix to correctly avoid the segfault in
> the prepare_metric function, and a new patch to properly reduce the CPU maps
> of the tool events. The remaining affinity reduction patches are still part
> of the series and I believe ready to land.

That sounds like it's targeted for 6.20?  I assume the revert on its own needs
to land in 6.19 (or stable if too late for 6.19) somehow?

Greetings,

Andres Freund

Re: [PATCH v4 07/10] perf tool_pmu: More accurately set the cpus for tool events

Posted by Ian Rogers 4 days, 19 hours ago

On Tue, Feb 3, 2026 at 3:27 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2026-02-03 15:05:39 -0800, Ian Rogers wrote:
> > Thanks for the bug report! There were changes in v6.19 to change how
> > metrics were computed hence the fault in the metric computation. The
> > bisected patch is at fault, it aims to compute the CPU maps for tool
> > events in a way to avoid unnecessary sched setaffinity calls. By
> > having a smaller CPU map it set up a situation with user CPUs where
> > the tool events could have no CPUs, removed from the list of events
> > and the later seg fault occurred.
>
> Thanks for the quick reply!
>
> > I sent out a new patch series:
> > https://lore.kernel.org/linux-perf-users/20260203225129.4077140-1-irogers@google.com/
> > that has a revert of this patch, a fix to correctly avoid the segfault in
> > the prepare_metric function, and a new patch to properly reduce the CPU maps
> > of the tool events. The remaining affinity reduction patches are still part
> > of the series and I believe ready to land.
>
> That sounds like it's targeted for 6.20?  I assume the revert on its own needs
> to land in 6.19 (or stable if too late for 6.19) somehow?

Yeah, the maintainers are the ones creating the pull requests and can do this.

Thanks,
Ian

> Greetings,
>
> Andres Freund