Add flag that disables affinity behavior. Using sched_setaffinity to
place a perf thread on a CPU can avoid certain interprocessor
interrupts but may introduce a delay due to the scheduling,
particularly on loaded machines. Add a command line option to disable
the behavior. This behavior is less present in other tools like `perf
record`, as it uses a ring buffer and doesn't make repeated system
calls.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/Documentation/perf-stat.txt | 4 ++++
tools/perf/builtin-stat.c | 6 ++++++
tools/perf/util/evlist.c | 6 +-----
tools/perf/util/evlist.h | 1 +
4 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 1a766d4a2233..1ffb510606af 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -382,6 +382,10 @@ color the metric's computed value.
Don't print output, warnings or messages. This is useful with perf stat
record below to only write data to the perf.data file.
+--no-affinity::
+Don't change scheduler affinities when iterating over CPUs. Disables
+an optimization aimed at minimizing interprocessor interrupts.
+
STAT RECORD
-----------
Stores stat data into perf data file.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bb14268e7393..ddda0ea62eaf 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2427,6 +2427,7 @@ static int parse_tpebs_mode(const struct option *opt, const char *str,
int cmd_stat(int argc, const char **argv)
{
struct opt_aggr_mode opt_mode = {};
+ bool affinity = true, affinity_set = false;
struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", &transaction_run,
"hardware transaction statistics"),
@@ -2555,6 +2556,8 @@ int cmd_stat(int argc, const char **argv)
"don't print 'summary' for CSV summary output"),
OPT_BOOLEAN(0, "quiet", &quiet,
"don't print any output, messages or warnings (useful with record)"),
+ OPT_BOOLEAN_SET(0, "affinity", &affinity, &affinity_set,
+ "don't allow affinity optimizations aimed at reducing IPIs"),
OPT_CALLBACK(0, "cputype", &evsel_list, "hybrid cpu type",
"Only enable events on applying cpu with this type "
"for hybrid platform (e.g. core or atom)",
@@ -2612,6 +2615,9 @@ int cmd_stat(int argc, const char **argv)
} else
stat_config.csv_sep = DEFAULT_SEPARATOR;
+ if (affinity_set)
+ evsel_list->no_affinity = !affinity;
+
if (argc && strlen(argv[0]) > 2 && strstarts("record", argv[0])) {
argc = __cmd_record(stat_options, &opt_mode, argc, argv);
if (argc < 0)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d62b8bab8fa4..00fb3cf45bae 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -369,11 +369,7 @@ static bool evlist__use_affinity(struct evlist *evlist)
struct perf_cpu_map *used_cpus = NULL;
bool ret = false;
- /*
- * With perf record core.user_requested_cpus is usually NULL.
- * Use the old method to handle this for now.
- */
- if (!evlist->core.user_requested_cpus ||
+ if (evlist->no_affinity || !evlist->core.user_requested_cpus ||
cpu_map__is_dummy(evlist->core.user_requested_cpus))
return false;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 30dff7484d3c..d17c3b57a409 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -59,6 +59,7 @@ struct event_enable_timer;
struct evlist {
struct perf_evlist core;
bool enabled;
+ bool no_affinity;
int id_pos;
int is_pos;
int nr_br_cntr;
--
2.52.0.457.g6b5491de43-goog
On Thu, Jan 08, 2026 at 01:26:52PM -0800, Ian Rogers wrote: > Add flag that disables affinity behavior. Using sched_setaffinity to > place a perf thread on a CPU can avoid certain interprocessor > interrupts but may introduce a delay due to the scheduling, > particularly on loaded machines. Add a command line option to disable > the behavior. This behavior is less present in other tools like `perf > record`, as it uses a ring buffer and doesn't make repeated system > calls. I thought we had agreed that this change isn't needed? -Andi
On Thu, Jan 8, 2026 at 2:07 PM Andi Kleen <ak@linux.intel.com> wrote: > > On Thu, Jan 08, 2026 at 01:26:52PM -0800, Ian Rogers wrote: > > Add flag that disables affinity behavior. Using sched_setaffinity to > > place a perf thread on a CPU can avoid certain interprocessor > > interrupts but may introduce a delay due to the scheduling, > > particularly on loaded machines. Add a command line option to disable > > the behavior. This behavior is less present in other tools like `perf > > record`, as it uses a ring buffer and doesn't make repeated system > > calls. > > I thought we had agreed that this change isn't needed? This patch or the series? My last feedback was: https://lore.kernel.org/lkml/CAP-5=fUvsF7RtLAKaMwc28CeSEOJ+j0gVwvQN59moOnUS=kWVg@mail.gmail.com/ So the code as-is is trying to always use setaffinity. For a single syscall on a particular CPU this is unlikely to be profitable on a machine under load, the IPI will happen faster. You mentioned that realtime priorities could address this but that this also required capabilities. I didn't see that as something that contradicted the use of these patches. The point of the flag in this change is so that the IPI behavior can be had should issues with CPU affinities be experienced. With how the code is refactored we can also make the existing "always use affinities" behavior an option by modifying evlist__use_affinity, but given the cost of an IPI should be less than that of migrating a thread it doesn't seem like this option would be useful - you could also make the code do this by adding a dummy event as >1 event triggers the setaffinity behavior. Anyway, I think we still want these changes that successfully fixed a customer issue I had. The already merged patches do improve things, and I think we can do yet more, but the real cause of the delay in reading counters was the calls to setaffinity and this being slow on loaded machines. The patches make it so setaffinity doesn't happen if there is just a single IPI being saved or if requested on the command line. Thanks, Ian > -Andi
© 2016 - 2026 Red Hat, Inc.