From nobody Mon Feb  9 02:27:18 2026
Received: from mail-dy1-f201.google.com (mail-dy1-f201.google.com
 [74.125.82.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64EE230E831
	for <linux-kernel@vger.kernel.org>; Fri,  6 Feb 2026 22:25:24 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=74.125.82.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770416724; cv=none;
 b=NcepDzaurNQZI/K/QIl5FZ2gJ0bKD0ZzxIVN9AcpDuBGAq/b13pU2As5ZYQQQSzfMnVdnxN5AaTjKgD1H7CZ4kxSR1yK8WwYLGsl74KAZIGZBjbBcAwkUdCTpFFkBmBcSm6uxq8kuUh4mADpQDxZzKG5lTXQlqFcoWU0kqNnx1Q=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770416724; c=relaxed/simple;
	bh=nMCvD9FhHWv6QXJohgSpbLwo660AjijJtxk2b8JgbNs=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=OM6XBMBSPkKYLfCbYJQJq/kO+ETtNVKRmxe49Hjd0gcP/DfgGi9aqupw+TjrbmKmaxkmxJMSwjChBVSfNApBSCLPGgeqYzUEGoeu63fLtfp1E6gG53ySwFaY1MJD9FmV0gQsSBDWU/WLNHFnPYTM+X+s9kJHoFGyQ9hOLEAu/vE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--irogers.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=1LBbKBj9; arc=none smtp.client-ip=74.125.82.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--irogers.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="1LBbKBj9"
Received: by mail-dy1-f201.google.com with SMTP id
 5a478bee46e88-2b866e72c00so213832eec.1
        for <linux-kernel@vger.kernel.org>;
 Fri, 06 Feb 2026 14:25:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1770416724; x=1771021524;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=LBr6QtiTmPDnDtdThT9x1EHn8qfDN4a5mvSafr0ZxaM=;
        b=1LBbKBj9KMYsZLeLqUKWYB6nQtZlsrBYLUOleYiKCtoUamFj0wyNfkI0AsUa5Ge6eM
         fXliPKID2FS7w1SyaxKy/nqRvL2cPoeC24fJ5azB/ekfx6vVabSbvZlhN9dpxjBfs8Ad
         HNtcq7A2THz4N4Q3vH2PrZrMZukmncsJEvFM7dD3xnKS42ztaxAbgx+ggkHbXjpss0Pz
         /IQr/G+33hxutGGLjrbR3PvDm2ogK6tawLEGvR7R5vspTEpF9/y2o57rKq9wfqcjli/i
         R4paiHNPn5kpkO/NGTAoJfaP6PY/ceh2JDMAEqxikUvC0iLjDolRMmmY8pC6CMAVlJKh
         yXPQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1770416724; x=1771021524;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=LBr6QtiTmPDnDtdThT9x1EHn8qfDN4a5mvSafr0ZxaM=;
        b=VSbpzXwvtg9UhbNJb8FnQepkGnnS6vdoPo+D8kuQpEJJT2FIWJ/U3CrfIJvpT6zLCT
         E7cWVqAgUN0JUGtYW2/eaZJir6Jh+vdcvJQQkGse6PwgcH9y2UdsU4egW3ANwQgsHj9c
         gy3WEVc/Xd7nnhgeSFRgxtx1aBaI1idwhf8iJEtoV3g3NuBxnivRcsQwF4qXp+7zYZjq
         gEpNxEQVjxLb90Nux0S1LOf7f/cIiMWAOUGBjXmo5sLd+MF/71/XhhLMfAy0GR9dG8T8
         voA9Vky8sqviRCYaELvtfUVd+dKOKu7L1nxtwaNQRtclUhULB/5iOZAE/2qC0gnFuvaB
         GjwQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCWjYG1+YdsNxQSu0S6DeW2sB8n7zLIwuDARCfl1LBhPvHpEyMdaMCcfOzgn0hLdHUWirN0W1b5IG9oEQB8=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz1gUDIIjzZPcuveQkYGD3RKivX3uGFExGQPaCuF1HEdNdVdn5e
	Yief+KO72sON4EUK5+rPupHY4l0GCfQRjEVjca32wwiuGLU+fBNtLozCrU/TDduEFlCX1pY4e/7
	KMaZI++lDuw==
X-Received: from dll7.prod.google.com ([2002:a05:7022:207:b0:126:f8d3:3aea])
 (user=irogers job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:7022:49a:b0:119:e56b:98b9
 with SMTP id a92af1059eb24-12704048c3bmr2429877c88.32.1770416723640; Fri, 06
 Feb 2026 14:25:23 -0800 (PST)
Date: Fri,  6 Feb 2026 14:25:08 -0800
In-Reply-To: <20260206222509.982489-1-irogers@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <aYZnkYqWXnKpHd3h@x1> <20260206222509.982489-1-irogers@google.com>
X-Mailer: git-send-email 2.53.0.rc2.204.g2597b5adb4-goog
Message-ID: <20260206222509.982489-6-irogers@google.com>
Subject: [PATCH v8 5/6] perf evlist: Reduce affinity use and move into
 iterator, fix no affinity
From: Ian Rogers <irogers@google.com>
To: acme@kernel.org
Cc: adrian.hunter@intel.com, ak@linux.intel.com,
	alexander.shishkin@linux.intel.com, andres@anarazel.de,
	dapeng1.mi@linux.intel.com, irogers@google.com, james.clark@linaro.org,
	jolsa@kernel.org, linux-kernel@vger.kernel.org,
	linux-perf-users@vger.kernel.org, linux@treblig.org, mingo@redhat.com,
	namhyung@kernel.org, peterz@infradead.org, thomas.falcon@intel.com,
	tmricht@linux.ibm.com, yang.lee@linux.alibaba.com
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The evlist__for_each_cpu iterator will call sched_setaffitinity when
moving between CPUs to avoid IPIs. If only 1 IPI is saved then this
may be unprofitable as the delay to get scheduled may be
considerable. This may be particularly true if reading an event group
in `perf stat` in interval mode.

Move the affinity handling completely into the iterator so that a
single evlist__use_affinity can determine whether CPU affinities will
be used. For `perf record` the change is minimal as the dummy event
and the real event will always make the use of affinities the thing to
do. In `perf stat`, tool events are ignored and affinities only used
if >1 event on the same CPU occur. Determining if affinities are
useful is done by evlist__use_affinity which tests per-event whether
the event's PMU benefits from affinity use - it is assumed only perf
event using PMUs do.

Fix a bug where when there are no affinities that the CPU map iterator
may reference a CPU not present in the initial evsel. Fix by making
the iterator and non-iterator code common.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c | 108 +++++++++++---------------
 tools/perf/util/evlist.c  | 158 +++++++++++++++++++++++---------------
 tools/perf/util/evlist.h  |  26 +++++--
 tools/perf/util/pmu.c     |  12 +++
 tools/perf/util/pmu.h     |   1 +
 5 files changed, 174 insertions(+), 131 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2895b809607f..c1bb40b99176 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -369,19 +369,11 @@ static int read_counter_cpu(struct evsel *counter, in=
t cpu_map_idx)
 static int read_counters_with_affinity(void)
 {
 	struct evlist_cpu_iterator evlist_cpu_itr;
-	struct affinity saved_affinity, *affinity;
=20
 	if (all_counters_use_bpf)
 		return 0;
=20
-	if (!target__has_cpu(&target) || target__has_per_thread(&target))
-		affinity =3D NULL;
-	else if (affinity__setup(&saved_affinity) < 0)
-		return -1;
-	else
-		affinity =3D &saved_affinity;
-
-	evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
+	evlist__for_each_cpu(evlist_cpu_itr, evsel_list) {
 		struct evsel *counter =3D evlist_cpu_itr.evsel;
=20
 		if (evsel__is_bpf(counter))
@@ -393,8 +385,6 @@ static int read_counters_with_affinity(void)
 		if (!counter->err)
 			counter->err =3D read_counter_cpu(counter, evlist_cpu_itr.cpu_map_idx);
 	}
-	if (affinity)
-		affinity__cleanup(&saved_affinity);
=20
 	return 0;
 }
@@ -793,7 +783,6 @@ static int __run_perf_stat(int argc, const char **argv,=
 int run_idx)
 	const bool forks =3D (argc > 0);
 	bool is_pipe =3D STAT_RECORD ? perf_stat.data.is_pipe : false;
 	struct evlist_cpu_iterator evlist_cpu_itr;
-	struct affinity saved_affinity, *affinity =3D NULL;
 	int err, open_err =3D 0;
 	bool second_pass =3D false, has_supported_counters;
=20
@@ -805,14 +794,6 @@ static int __run_perf_stat(int argc, const char **argv=
, int run_idx)
 		child_pid =3D evsel_list->workload.pid;
 	}
=20
-	if (!cpu_map__is_dummy(evsel_list->core.user_requested_cpus)) {
-		if (affinity__setup(&saved_affinity) < 0) {
-			err =3D -1;
-			goto err_out;
-		}
-		affinity =3D &saved_affinity;
-	}
-
 	evlist__for_each_entry(evsel_list, counter) {
 		counter->reset_group =3D false;
 		if (bpf_counter__load(counter, &target)) {
@@ -825,49 +806,48 @@ static int __run_perf_stat(int argc, const char **arg=
v, int run_idx)
=20
 	evlist__reset_aggr_stats(evsel_list);
=20
-	evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
-		counter =3D evlist_cpu_itr.evsel;
+	/*
+	 * bperf calls evsel__open_per_cpu() in bperf__load(), so
+	 * no need to call it again here.
+	 */
+	if (!target.use_bpf) {
+		evlist__for_each_cpu(evlist_cpu_itr, evsel_list) {
+			counter =3D evlist_cpu_itr.evsel;
=20
-		/*
-		 * bperf calls evsel__open_per_cpu() in bperf__load(), so
-		 * no need to call it again here.
-		 */
-		if (target.use_bpf)
-			break;
+			if (counter->reset_group || !counter->supported)
+				continue;
+			if (evsel__is_bperf(counter))
+				continue;
=20
-		if (counter->reset_group || !counter->supported)
-			continue;
-		if (evsel__is_bperf(counter))
-			continue;
+			while (true) {
+				if (create_perf_stat_counter(counter, &stat_config,
+							      evlist_cpu_itr.cpu_map_idx) =3D=3D 0)
+					break;
=20
-		while (true) {
-			if (create_perf_stat_counter(counter, &stat_config,
-						     evlist_cpu_itr.cpu_map_idx) =3D=3D 0)
-				break;
+				open_err =3D errno;
+				/*
+				 * Weak group failed. We cannot just undo this
+				 * here because earlier CPUs might be in group
+				 * mode, and the kernel doesn't support mixing
+				 * group and non group reads. Defer it to later.
+				 * Don't close here because we're in the wrong
+				 * affinity.
+				 */
+				if ((open_err =3D=3D EINVAL || open_err =3D=3D EBADF) &&
+					evsel__leader(counter) !=3D counter &&
+					counter->weak_group) {
+					evlist__reset_weak_group(evsel_list, counter, false);
+					assert(counter->reset_group);
+					counter->supported =3D true;
+					second_pass =3D true;
+					break;
+				}
=20
-			open_err =3D errno;
-			/*
-			 * Weak group failed. We cannot just undo this here
-			 * because earlier CPUs might be in group mode, and the kernel
-			 * doesn't support mixing group and non group reads. Defer
-			 * it to later.
-			 * Don't close here because we're in the wrong affinity.
-			 */
-			if ((open_err =3D=3D EINVAL || open_err =3D=3D EBADF) &&
-				evsel__leader(counter) !=3D counter &&
-				counter->weak_group) {
-				evlist__reset_weak_group(evsel_list, counter, false);
-				assert(counter->reset_group);
-				counter->supported =3D true;
-				second_pass =3D true;
-				break;
+				if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY)
+					break;
 			}
-
-			if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY)
-				break;
 		}
 	}
-
 	if (second_pass) {
 		/*
 		 * Now redo all the weak group after closing them,
@@ -875,7 +855,7 @@ static int __run_perf_stat(int argc, const char **argv,=
 int run_idx)
 		 */
=20
 		/* First close errored or weak retry */
-		evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
+		evlist__for_each_cpu(evlist_cpu_itr, evsel_list) {
 			counter =3D evlist_cpu_itr.evsel;
=20
 			if (!counter->reset_group && counter->supported)
@@ -884,7 +864,7 @@ static int __run_perf_stat(int argc, const char **argv,=
 int run_idx)
 			perf_evsel__close_cpu(&counter->core, evlist_cpu_itr.cpu_map_idx);
 		}
 		/* Now reopen weak */
-		evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) {
+		evlist__for_each_cpu(evlist_cpu_itr, evsel_list) {
 			counter =3D evlist_cpu_itr.evsel;
=20
 			if (!counter->reset_group)
@@ -893,17 +873,18 @@ static int __run_perf_stat(int argc, const char **arg=
v, int run_idx)
 			while (true) {
 				pr_debug2("reopening weak %s\n", evsel__name(counter));
 				if (create_perf_stat_counter(counter, &stat_config,
-							     evlist_cpu_itr.cpu_map_idx) =3D=3D 0)
+							     evlist_cpu_itr.cpu_map_idx) =3D=3D 0) {
+					evlist_cpu_iterator__exit(&evlist_cpu_itr);
 					break;
-
+				}
 				open_err =3D errno;
-				if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY)
+				if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY) {
+					evlist_cpu_iterator__exit(&evlist_cpu_itr);
 					break;
+				}
 			}
 		}
 	}
-	affinity__cleanup(affinity);
-	affinity =3D NULL;
=20
 	has_supported_counters =3D false;
 	evlist__for_each_entry(evsel_list, counter) {
@@ -1065,7 +1046,6 @@ static int __run_perf_stat(int argc, const char **arg=
v, int run_idx)
 	if (forks)
 		evlist__cancel_workload(evsel_list);
=20
-	affinity__cleanup(affinity);
 	return err;
 }
=20
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3abc2215e790..45833244daf3 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -359,36 +359,111 @@ int evlist__add_newtp(struct evlist *evlist, const c=
har *sys, const char *name,
 }
 #endif
=20
-struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct=
 affinity *affinity)
+/*
+ * Should sched_setaffinity be used with evlist__for_each_cpu? Determine if
+ * migrating the thread will avoid possibly numerous IPIs.
+ */
+static bool evlist__use_affinity(struct evlist *evlist)
+{
+	struct evsel *pos;
+	struct perf_cpu_map *used_cpus =3D NULL;
+	bool ret =3D false;
+
+	/*
+	 * With perf record core.user_requested_cpus is usually NULL.
+	 * Use the old method to handle this for now.
+	 */
+	if (!evlist->core.user_requested_cpus ||
+	    cpu_map__is_dummy(evlist->core.user_requested_cpus))
+		return false;
+
+	evlist__for_each_entry(evlist, pos) {
+		struct perf_cpu_map *intersect;
+
+		if (!perf_pmu__benefits_from_affinity(pos->pmu))
+			continue;
+
+		if (evsel__is_dummy_event(pos)) {
+			/*
+			 * The dummy event is opened on all CPUs so assume >1
+			 * event with shared CPUs.
+			 */
+			ret =3D true;
+			break;
+		}
+		if (evsel__is_retire_lat(pos)) {
+			/*
+			 * Retirement latency events are similar to tool ones in
+			 * their implementation, and so don't require affinity.
+			 */
+			continue;
+		}
+		if (perf_cpu_map__is_empty(used_cpus)) {
+			/* First benefitting event, we want >1 on a common CPU. */
+			used_cpus =3D perf_cpu_map__get(pos->core.cpus);
+			continue;
+		}
+		if ((pos->core.attr.read_format & PERF_FORMAT_GROUP) &&
+		    evsel__leader(pos) !=3D pos) {
+			/* Skip members of the same sample group. */
+			continue;
+		}
+		intersect =3D perf_cpu_map__intersect(used_cpus, pos->core.cpus);
+		if (!perf_cpu_map__is_empty(intersect)) {
+			/* >1 event with shared CPUs. */
+			perf_cpu_map__put(intersect);
+			ret =3D true;
+			break;
+		}
+		perf_cpu_map__put(intersect);
+		perf_cpu_map__merge(&used_cpus, pos->core.cpus);
+	}
+	perf_cpu_map__put(used_cpus);
+	return ret;
+}
+
+void evlist_cpu_iterator__init(struct evlist_cpu_iterator *itr, struct evl=
ist *evlist)
 {
-	struct evlist_cpu_iterator itr =3D {
+	*itr =3D (struct evlist_cpu_iterator){
 		.container =3D evlist,
 		.evsel =3D NULL,
 		.cpu_map_idx =3D 0,
 		.evlist_cpu_map_idx =3D 0,
 		.evlist_cpu_map_nr =3D perf_cpu_map__nr(evlist->core.all_cpus),
 		.cpu =3D (struct perf_cpu){ .cpu =3D -1},
-		.affinity =3D affinity,
+		.affinity =3D NULL,
 	};
=20
 	if (evlist__empty(evlist)) {
 		/* Ensure the empty list doesn't iterate. */
-		itr.evlist_cpu_map_idx =3D itr.evlist_cpu_map_nr;
-	} else {
-		itr.evsel =3D evlist__first(evlist);
-		if (itr.affinity) {
-			itr.cpu =3D perf_cpu_map__cpu(evlist->core.all_cpus, 0);
-			affinity__set(itr.affinity, itr.cpu.cpu);
-			itr.cpu_map_idx =3D perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu);
-			/*
-			 * If this CPU isn't in the evsel's cpu map then advance
-			 * through the list.
-			 */
-			if (itr.cpu_map_idx =3D=3D -1)
-				evlist_cpu_iterator__next(&itr);
-		}
+		itr->evlist_cpu_map_idx =3D itr->evlist_cpu_map_nr;
+		return;
 	}
-	return itr;
+
+	if (evlist__use_affinity(evlist)) {
+		if (affinity__setup(&itr->saved_affinity) =3D=3D 0)
+			itr->affinity =3D &itr->saved_affinity;
+	}
+	itr->evsel =3D evlist__first(evlist);
+	itr->cpu =3D perf_cpu_map__cpu(evlist->core.all_cpus, 0);
+	if (itr->affinity)
+		affinity__set(itr->affinity, itr->cpu.cpu);
+	itr->cpu_map_idx =3D perf_cpu_map__idx(itr->evsel->core.cpus, itr->cpu);
+	/*
+	 * If this CPU isn't in the evsel's cpu map then advance
+	 * through the list.
+	 */
+	if (itr->cpu_map_idx =3D=3D -1)
+		evlist_cpu_iterator__next(itr);
+}
+
+void evlist_cpu_iterator__exit(struct evlist_cpu_iterator *itr)
+{
+	if (!itr->affinity)
+		return;
+
+	affinity__cleanup(itr->affinity);
+	itr->affinity =3D NULL;
 }
=20
 void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr)
@@ -418,14 +493,11 @@ void evlist_cpu_iterator__next(struct evlist_cpu_iter=
ator *evlist_cpu_itr)
 		 */
 		if (evlist_cpu_itr->cpu_map_idx =3D=3D -1)
 			evlist_cpu_iterator__next(evlist_cpu_itr);
+	} else {
+		evlist_cpu_iterator__exit(evlist_cpu_itr);
 	}
 }
=20
-bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu=
_itr)
-{
-	return evlist_cpu_itr->evlist_cpu_map_idx >=3D evlist_cpu_itr->evlist_cpu=
_map_nr;
-}
-
 static int evsel__strcmp(struct evsel *pos, char *evsel_name)
 {
 	if (!evsel_name)
@@ -453,19 +525,11 @@ static void __evlist__disable(struct evlist *evlist, =
char *evsel_name, bool excl
 {
 	struct evsel *pos;
 	struct evlist_cpu_iterator evlist_cpu_itr;
-	struct affinity saved_affinity, *affinity =3D NULL;
 	bool has_imm =3D false;
=20
-	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.user_requested_cpus)) {
-		if (affinity__setup(&saved_affinity) < 0)
-			return;
-		affinity =3D &saved_affinity;
-	}
-
 	/* Disable 'immediate' events last */
 	for (int imm =3D 0; imm <=3D 1; imm++) {
-		evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) {
+		evlist__for_each_cpu(evlist_cpu_itr, evlist) {
 			pos =3D evlist_cpu_itr.evsel;
 			if (evsel__strcmp(pos, evsel_name))
 				continue;
@@ -483,7 +547,6 @@ static void __evlist__disable(struct evlist *evlist, ch=
ar *evsel_name, bool excl
 			break;
 	}
=20
-	affinity__cleanup(affinity);
 	evlist__for_each_entry(evlist, pos) {
 		if (evsel__strcmp(pos, evsel_name))
 			continue;
@@ -523,16 +586,8 @@ static void __evlist__enable(struct evlist *evlist, ch=
ar *evsel_name, bool excl_
 {
 	struct evsel *pos;
 	struct evlist_cpu_iterator evlist_cpu_itr;
-	struct affinity saved_affinity, *affinity =3D NULL;
=20
-	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.user_requested_cpus)) {
-		if (affinity__setup(&saved_affinity) < 0)
-			return;
-		affinity =3D &saved_affinity;
-	}
-
-	evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) {
+	evlist__for_each_cpu(evlist_cpu_itr, evlist) {
 		pos =3D evlist_cpu_itr.evsel;
 		if (evsel__strcmp(pos, evsel_name))
 			continue;
@@ -542,7 +597,6 @@ static void __evlist__enable(struct evlist *evlist, cha=
r *evsel_name, bool excl_
 			continue;
 		evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx);
 	}
-	affinity__cleanup(affinity);
 	evlist__for_each_entry(evlist, pos) {
 		if (evsel__strcmp(pos, evsel_name))
 			continue;
@@ -1339,30 +1393,14 @@ void evlist__close(struct evlist *evlist)
 {
 	struct evsel *evsel;
 	struct evlist_cpu_iterator evlist_cpu_itr;
-	struct affinity affinity;
-
-	/*
-	 * With perf record core.user_requested_cpus is usually NULL.
-	 * Use the old method to handle this for now.
-	 */
-	if (!evlist->core.user_requested_cpus ||
-	    cpu_map__is_dummy(evlist->core.user_requested_cpus)) {
-		evlist__for_each_entry_reverse(evlist, evsel)
-			evsel__close(evsel);
-		return;
-	}
-
-	if (affinity__setup(&affinity) < 0)
-		return;
=20
-	evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) {
+	evlist__for_each_cpu(evlist_cpu_itr, evlist) {
 		if (evlist_cpu_itr.cpu_map_idx =3D=3D 0 && evsel__is_retire_lat(evlist_c=
pu_itr.evsel))
 			evsel__tpebs_close(evlist_cpu_itr.evsel);
 		perf_evsel__close_cpu(&evlist_cpu_itr.evsel->core,
 				      evlist_cpu_itr.cpu_map_idx);
 	}
=20
-	affinity__cleanup(&affinity);
 	evlist__for_each_entry_reverse(evlist, evsel) {
 		perf_evsel__free_fd(&evsel->core);
 		perf_evsel__free_id(&evsel->core);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 911834ae7c2a..30dff7484d3c 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -10,6 +10,7 @@
 #include <internal/evlist.h>
 #include <internal/evsel.h>
 #include <perf/evlist.h>
+#include "affinity.h"
 #include "events_stats.h"
 #include "evsel.h"
 #include "rblist.h"
@@ -363,6 +364,8 @@ struct evlist_cpu_iterator {
 	struct perf_cpu cpu;
 	/** If present, used to set the affinity when switching between CPUs. */
 	struct affinity *affinity;
+	/** Maybe be used to hold affinity state prior to iterating. */
+	struct affinity saved_affinity;
 };
=20
 /**
@@ -370,22 +373,31 @@ struct evlist_cpu_iterator {
  *                        affinity, iterate over all CPUs and then the evl=
ist
  *                        for each evsel on that CPU. When switching betwe=
en
  *                        CPUs the affinity is set to the CPU to avoid IPIs
- *                        during syscalls.
+ *                        during syscalls. The affinity is set up and remo=
ved
+ *                        automatically, if the loop is broken a call to
+ *                        evlist_cpu_iterator__exit is necessary.
  * @evlist_cpu_itr: the iterator instance.
  * @evlist: evlist instance to iterate.
- * @affinity: NULL or used to set the affinity to the current CPU.
  */
-#define evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity)		\
-	for ((evlist_cpu_itr) =3D evlist__cpu_begin(evlist, affinity);	\
+#define evlist__for_each_cpu(evlist_cpu_itr, evlist)			\
+	for (evlist_cpu_iterator__init(&(evlist_cpu_itr), evlist);	\
 	     !evlist_cpu_iterator__end(&evlist_cpu_itr);		\
 	     evlist_cpu_iterator__next(&evlist_cpu_itr))
=20
-/** Returns an iterator set to the first CPU/evsel of evlist. */
-struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct=
 affinity *affinity);
+/** Setup an iterator set to the first CPU/evsel of evlist. */
+void evlist_cpu_iterator__init(struct evlist_cpu_iterator *itr, struct evl=
ist *evlist);
+/**
+ * Cleans up the iterator, automatically done by evlist_cpu_iterator__next=
 when
+ * the end of the list is reached. Multiple calls are safe.
+ */
+void evlist_cpu_iterator__exit(struct evlist_cpu_iterator *itr);
 /** Move to next element in iterator, updating CPU, evsel and the affinity=
. */
 void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr);
 /** Returns true when iterator is at the end of the CPUs and evlist. */
-bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu=
_itr);
+static inline bool evlist_cpu_iterator__end(const struct evlist_cpu_iterat=
or *evlist_cpu_itr)
+{
+	return evlist_cpu_itr->evlist_cpu_map_idx >=3D evlist_cpu_itr->evlist_cpu=
_map_nr;
+}
=20
 struct evsel *evlist__get_tracking_event(struct evlist *evlist);
 void evlist__set_tracking_event(struct evlist *evlist, struct evsel *track=
ing_evsel);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 81ab74681c9b..5cdd350e8885 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -2375,6 +2375,18 @@ bool perf_pmu__is_software(const struct perf_pmu *pm=
u)
 	return false;
 }
=20
+bool perf_pmu__benefits_from_affinity(struct perf_pmu *pmu)
+{
+	if (!pmu)
+		return true; /* Assume is core. */
+
+	/*
+	 * All perf event PMUs should benefit from accessing the perf event
+	 * contexts on the local CPU.
+	 */
+	return pmu->type <=3D PERF_PMU_TYPE_PE_END;
+}
+
 FILE *perf_pmu__open_file(const struct perf_pmu *pmu, const char *name)
 {
 	char path[PATH_MAX];
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 41c21389f393..0d9f3c57e8e8 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -303,6 +303,7 @@ bool perf_pmu__name_no_suffix_match(const struct perf_p=
mu *pmu, const char *to_m
  *                        perf_sw_context in the kernel?
  */
 bool perf_pmu__is_software(const struct perf_pmu *pmu);
+bool perf_pmu__benefits_from_affinity(struct perf_pmu *pmu);
=20
 FILE *perf_pmu__open_file(const struct perf_pmu *pmu, const char *name);
 FILE *perf_pmu__open_file_at(const struct perf_pmu *pmu, int dirfd, const =
char *name);
--=20
2.53.0.rc2.204.g2597b5adb4-goog