From nobody Sun Feb 8 11:25:56 2026 Received: from mail-dl1-f74.google.com (mail-dl1-f74.google.com [74.125.82.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C32E2335542 for ; Thu, 8 Jan 2026 21:27:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907639; cv=none; b=U16q0K4IfSOZwV4ret6aVei5pfAEFF/rpI2E2YB8kReL3DR9jIHA2HYCNhcecVBCcKpBX9NVDAd+zRI1swmo+TqdkPBOiX2G2C/+J1yqsUFUYXA//UTVuZoFK4v6qy5JxTffv4iTR8ACxW1DVdB6OHQA/dqDEKaUhWt5ecLdZRc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907639; c=relaxed/simple; bh=gij9HVFo+8YcRc2A29tLhsb/VSjGl5q57WrozqE/p1Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=uPEAGkXEqpK4zGPlyySF5fJYh40GG0dt+jMjOBi97jIs1ZfMmL0hMSWqCyuo+tkk3twOiT2arqLQsGwOYOLJEBaMBlBWVlQZLDwh4bXq7XD/DWqTYsVlFuyOyaWD2iMVZxEqkQWcN7FodYaLB3tbwkmYTFAELr8y4PIDz4NX15o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=UCrOpSXg; arc=none smtp.client-ip=74.125.82.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="UCrOpSXg" Received: by mail-dl1-f74.google.com with SMTP id a92af1059eb24-11b94abc09dso4342917c88.1 for ; Thu, 08 Jan 2026 13:27:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767907637; x=1768512437; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=JJOaT7VWATb5931yVIrXVJY+gMVDezHuu16jYMmARaA=; b=UCrOpSXgqwAsUkAgqDurT+7VYRIqgEbFbUOTDI6pOnKlEUlbTA3m2fDAn5QbGG012S RLn8cgcU92fJZviNUpKhhFwxd27xutyZE2w/xfqMJ30TTOovnaaN0WKT18JVZNb6qhV8 8kJ2pde9h//k7u1es3Twkvl8kt+ZSR6hnCldt1OivhsHSzI2RcsCYQqlQKSg3+QromiR GM3CM7w2kBrcfvwW78N/qtIDy/KTO/8bgc9Da/w8w8ih8Z1/DBXx7EIF1bV0Q77ERfZa lvwhdyLAminjl861NUryJ597GUH3/hLHwKg9MdAny3leqEnCRInfybYwaPXX06mi9qfT +QyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767907637; x=1768512437; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JJOaT7VWATb5931yVIrXVJY+gMVDezHuu16jYMmARaA=; b=fyY0R0ptqCQXGUCmlYwC/umKYcyOiB5+GwbNSm1bRXb44KAi1bXUCFh3FZoN09UGxV HPlRiRJ/zFAVel31+WNdaE7JioJXQVomuZTgSh0Rv44VMmptVUnJZl4jUNjYs12Lic2O XqIPn8RR3aOrKf8F2w6rvYmoELQfBRY+rIwXg6fdutckUrqwRtwYohhm6JNgoMutZv5/ K03A7+0SLCrVk/jYReQky5AH3Eg/fbDmaUxwOHVHa6RjXz/yreaaRAFe3ji6tkvMfQHZ 00wlEMC5JHLlDSAKYobsrm3SJ2ZlrkBNg2gUs6KNLhsgfTwEYkyWyOqoR5ET7ZBhrGm8 lZ9g== X-Forwarded-Encrypted: i=1; AJvYcCUGFg9zPn37+DNhTzb1XbiikZKQDQU9xKo+6DefJ7/Kprc+B5k4WOVFWvU8ubtqAgiXgIF1vT1ioj/WWH8=@vger.kernel.org X-Gm-Message-State: AOJu0YwVystQXVPbfuOiIrd6d3OLd9Hwqo9SYKLCy4CrzY/FMn9aoXXz SMgFPvJcufL+dbEoGZKhnYQa1SluQaTOKJebpzPDaFhi1S+hpVV3MM8ZzhFepXu0lu1lZHfKpR4 H7T+35LAdzQ== X-Google-Smtp-Source: AGHT+IFz3KgK5j6u4eklj/8Bl9SBNLi5PlkGJkS2SZG1fYowmsCYCqslh5e1ENtRqxQ9ZP9/Pnu5lyK/YAWa X-Received: from dlbqi2.prod.google.com ([2002:a05:7022:ebc2:b0:120:5def:9544]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7022:fe08:b0:122:1d4:28ff with SMTP id a92af1059eb24-12201d42ac2mr2248712c88.26.1767907636815; Thu, 08 Jan 2026 13:27:16 -0800 (PST) Date: Thu, 8 Jan 2026 13:26:50 -0800 In-Reply-To: <20260108212652.768875-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260108212652.768875-1-irogers@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260108212652.768875-2-irogers@google.com> Subject: [PATCH v6 1/3] perf evlist: Missing TPEBS close in evlist__close From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Dr. David Alan Gilbert" , Yang Li , James Clark , Thomas Falcon , Thomas Richter , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Dapeng Mi Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The libperf evsel close won't close TPEBS events properly. Add a test to do this. The libperf close routine is used in evlist__close for affinity reasons. Signed-off-by: Ian Rogers --- tools/perf/util/evlist.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 649519628541..bb042d89e6a0 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1356,6 +1356,8 @@ void evlist__close(struct evlist *evlist) return; =20 evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + if (evlist_cpu_itr.cpu_map_idx =3D=3D 0 && evsel__is_retire_lat(evlist_c= pu_itr.evsel)) + evsel__tpebs_close(evlist_cpu_itr.evsel); perf_evsel__close_cpu(&evlist_cpu_itr.evsel->core, evlist_cpu_itr.cpu_map_idx); } --=20 2.52.0.457.g6b5491de43-goog From nobody Sun Feb 8 11:25:56 2026 Received: from mail-dy1-f201.google.com (mail-dy1-f201.google.com [74.125.82.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 909D5339865 for ; Thu, 8 Jan 2026 21:27:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907641; cv=none; b=D+YohXOmc04tYnnv8wx9hFugjaMwADM9WSKu/6u7NVEqzJuy0CKHAyWvaNv8V4VKUqab/535hGe4YKBo0wJfKWIHHoNJYtxLyxz/yam2bCUIhqmZNTL9+amsvYmT2cLkkDzKSO5fJDZ40rKvnxnRrASqBdO5n0UBc1trfbARMes= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907641; c=relaxed/simple; bh=44u2BX1Be3ukoW6f8tLxjz5wVAAgNfqwusVdge1mGWk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=n15wUqRhpaBe2o3+E//zNxL4aK8JuuiNvzOs8ZJmESOtT2cr0GVpbGkwl50SiqSlmH6Wk3lt6zq05kF7JoSsnNnpoMvT8ThCHtkryM6FjzaidXO1wqdhiKAgklbdTG9KvWdJJvD5566WdIqoX6uiQE4RnRssgCwgr1TcKdtJNhQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TK4rhEa9; arc=none smtp.client-ip=74.125.82.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TK4rhEa9" Received: by mail-dy1-f201.google.com with SMTP id 5a478bee46e88-2ac363a9465so2725513eec.0 for ; Thu, 08 Jan 2026 13:27:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767907639; x=1768512439; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=AIRaiyUrIQZ0d9zfrNWTgaS8g8sLW8nrpNZ7g22zX6k=; b=TK4rhEa9GamoTdCGuqZdvDyP2J4GMAAqzM3O8kpd+dkjqQHw3tYLEDQSO2iNgGN5Y7 iJbJgSou+JdLkr9v6ngNvSwSiSdcTMhIAnFEUsIYsxfyuOJN1d3Uc5J6RGP3rMmV/rX4 h7cC88jLcy3vsoqRlp5CbHuOUjTDniavV6K125cRVX37zb76ebBJnWlhyGeyRi1t5I68 WBSQqHRM9ZmtgMCMUhCe/Jy0YHD0uxH+4rk366WnNwxJ0feGkSls2J1IqkCGJS6U987O riYB337D5CXZKa/DGHSyIFYmhKqTRyrKZzRr03orWPTueRA1UN+KCFAuNqq2InaHRnLU 7jWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767907639; x=1768512439; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AIRaiyUrIQZ0d9zfrNWTgaS8g8sLW8nrpNZ7g22zX6k=; b=D/BtIEDxGRYPlPUudV7il+JDKexWadNNQxXwokEK383ST8oNF0gVeQ0Na+CPFWFl+L VXUBAH+cTRnZdNPy1I1qcwOkFpy9YAvR7gJVNulxgqR9uc4iBFUE7R39zy3YWjthgkxE Min9SjQlV9TkB63IocHSrtwYssxyORwRJd6aID2YFNQ+tnRMsfxyqj80NC8POYKDXucV 14k1yMh//WOgA9O24xP0UV5oItHk+i7fIs6D9a4Tqo7gN5Ecnn68XQp+RwtOXe0z4cBK pS9Qn/cvJkEsF41f59ptACjpbrEXPWaq6s6tq1o4RBjW8hXbZ5NCLrGw6amjTqOIA7Ob gvhA== X-Forwarded-Encrypted: i=1; AJvYcCUrLtlQ9o0lazXfzwgDjbjyN769CwbDdRdQvWCMvkDbCeuzCDjY0gywtnD45vIXKAeyRS/nNs0NpxEggrU=@vger.kernel.org X-Gm-Message-State: AOJu0Yyk5ahyPeiiec9wHfz8ctCWT6xgKX2XB1ORblijBE21S/Hkr+yq wBhDn74iyy3R1H333F2QyQV3HYu5DD4tpfQexCZMhTTckL+axUu2Dq4Doq3AiSG7ymIvgw+QS/a nEJHyUrW83g== X-Google-Smtp-Source: AGHT+IHi03d06Pc4eQGu9xs0cAsKhQqax9RZtpBt/ZGPAOR+UzNIgUrGL8tZqfc62zx4BVJUZo4BCOOZ8QsJ X-Received: from dycsd4.prod.google.com ([2002:a05:693c:3104:b0:2ac:3545:743c]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:693c:288a:b0:2ae:59d3:46d3 with SMTP id 5a478bee46e88-2b17d2f0e16mr5193531eec.25.1767907638528; Thu, 08 Jan 2026 13:27:18 -0800 (PST) Date: Thu, 8 Jan 2026 13:26:51 -0800 In-Reply-To: <20260108212652.768875-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260108212652.768875-1-irogers@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260108212652.768875-3-irogers@google.com> Subject: [PATCH v6 2/3] perf evlist: Reduce affinity use and move into iterator, fix no affinity From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Dr. David Alan Gilbert" , Yang Li , James Clark , Thomas Falcon , Thomas Richter , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Dapeng Mi Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The evlist__for_each_cpu iterator will call sched_setaffitinity when moving between CPUs to avoid IPIs. If only 1 IPI is saved then this may be unprofitable as the delay to get scheduled may be considerable. This may be particularly true if reading an event group in `perf stat` in interval mode. Move the affinity handling completely into the iterator so that a single evlist__use_affinity can determine whether CPU affinities will be used. For `perf record` the change is minimal as the dummy event and the real event will always make the use of affinities the thing to do. In `perf stat`, tool events are ignored and affinities only used if >1 event on the same CPU occur. Determining if affinities are useful is done by evlist__use_affinity which tests per-event whether the event's PMU benefits from affinity use - it is assumed only perf event using PMUs do. Fix a bug where when there are no affinities that the CPU map iterator may reference a CPU not present in the initial evsel. Fix by making the iterator and non-iterator code common. Signed-off-by: Ian Rogers --- tools/perf/builtin-stat.c | 108 +++++++++++--------------- tools/perf/util/evlist.c | 158 +++++++++++++++++++++++--------------- tools/perf/util/evlist.h | 26 +++++-- tools/perf/util/pmu.c | 12 +++ tools/perf/util/pmu.h | 1 + 5 files changed, 174 insertions(+), 131 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index ab40d85fb125..bb14268e7393 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -369,19 +369,11 @@ static int read_counter_cpu(struct evsel *counter, in= t cpu_map_idx) static int read_counters_with_affinity(void) { struct evlist_cpu_iterator evlist_cpu_itr; - struct affinity saved_affinity, *affinity; =20 if (all_counters_use_bpf) return 0; =20 - if (!target__has_cpu(&target) || target__has_per_thread(&target)) - affinity =3D NULL; - else if (affinity__setup(&saved_affinity) < 0) - return -1; - else - affinity =3D &saved_affinity; - - evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evsel_list) { struct evsel *counter =3D evlist_cpu_itr.evsel; =20 if (evsel__is_bpf(counter)) @@ -393,8 +385,6 @@ static int read_counters_with_affinity(void) if (!counter->err) counter->err =3D read_counter_cpu(counter, evlist_cpu_itr.cpu_map_idx); } - if (affinity) - affinity__cleanup(&saved_affinity); =20 return 0; } @@ -793,7 +783,6 @@ static int __run_perf_stat(int argc, const char **argv,= int run_idx) const bool forks =3D (argc > 0); bool is_pipe =3D STAT_RECORD ? perf_stat.data.is_pipe : false; struct evlist_cpu_iterator evlist_cpu_itr; - struct affinity saved_affinity, *affinity =3D NULL; int err, open_err =3D 0; bool second_pass =3D false, has_supported_counters; =20 @@ -805,14 +794,6 @@ static int __run_perf_stat(int argc, const char **argv= , int run_idx) child_pid =3D evsel_list->workload.pid; } =20 - if (!cpu_map__is_dummy(evsel_list->core.user_requested_cpus)) { - if (affinity__setup(&saved_affinity) < 0) { - err =3D -1; - goto err_out; - } - affinity =3D &saved_affinity; - } - evlist__for_each_entry(evsel_list, counter) { counter->reset_group =3D false; if (bpf_counter__load(counter, &target)) { @@ -825,49 +806,48 @@ static int __run_perf_stat(int argc, const char **arg= v, int run_idx) =20 evlist__reset_aggr_stats(evsel_list); =20 - evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { - counter =3D evlist_cpu_itr.evsel; + /* + * bperf calls evsel__open_per_cpu() in bperf__load(), so + * no need to call it again here. + */ + if (!target.use_bpf) { + evlist__for_each_cpu(evlist_cpu_itr, evsel_list) { + counter =3D evlist_cpu_itr.evsel; =20 - /* - * bperf calls evsel__open_per_cpu() in bperf__load(), so - * no need to call it again here. - */ - if (target.use_bpf) - break; + if (counter->reset_group || !counter->supported) + continue; + if (evsel__is_bperf(counter)) + continue; =20 - if (counter->reset_group || !counter->supported) - continue; - if (evsel__is_bperf(counter)) - continue; + while (true) { + if (create_perf_stat_counter(counter, &stat_config, + evlist_cpu_itr.cpu_map_idx) =3D=3D 0) + break; =20 - while (true) { - if (create_perf_stat_counter(counter, &stat_config, - evlist_cpu_itr.cpu_map_idx) =3D=3D 0) - break; + open_err =3D errno; + /* + * Weak group failed. We cannot just undo this + * here because earlier CPUs might be in group + * mode, and the kernel doesn't support mixing + * group and non group reads. Defer it to later. + * Don't close here because we're in the wrong + * affinity. + */ + if ((open_err =3D=3D EINVAL || open_err =3D=3D EBADF) && + evsel__leader(counter) !=3D counter && + counter->weak_group) { + evlist__reset_weak_group(evsel_list, counter, false); + assert(counter->reset_group); + counter->supported =3D true; + second_pass =3D true; + break; + } =20 - open_err =3D errno; - /* - * Weak group failed. We cannot just undo this here - * because earlier CPUs might be in group mode, and the kernel - * doesn't support mixing group and non group reads. Defer - * it to later. - * Don't close here because we're in the wrong affinity. - */ - if ((open_err =3D=3D EINVAL || open_err =3D=3D EBADF) && - evsel__leader(counter) !=3D counter && - counter->weak_group) { - evlist__reset_weak_group(evsel_list, counter, false); - assert(counter->reset_group); - counter->supported =3D true; - second_pass =3D true; - break; + if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY) + break; } - - if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY) - break; } } - if (second_pass) { /* * Now redo all the weak group after closing them, @@ -875,7 +855,7 @@ static int __run_perf_stat(int argc, const char **argv,= int run_idx) */ =20 /* First close errored or weak retry */ - evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evsel_list) { counter =3D evlist_cpu_itr.evsel; =20 if (!counter->reset_group && counter->supported) @@ -884,7 +864,7 @@ static int __run_perf_stat(int argc, const char **argv,= int run_idx) perf_evsel__close_cpu(&counter->core, evlist_cpu_itr.cpu_map_idx); } /* Now reopen weak */ - evlist__for_each_cpu(evlist_cpu_itr, evsel_list, affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evsel_list) { counter =3D evlist_cpu_itr.evsel; =20 if (!counter->reset_group) @@ -893,17 +873,18 @@ static int __run_perf_stat(int argc, const char **arg= v, int run_idx) while (true) { pr_debug2("reopening weak %s\n", evsel__name(counter)); if (create_perf_stat_counter(counter, &stat_config, - evlist_cpu_itr.cpu_map_idx) =3D=3D 0) + evlist_cpu_itr.cpu_map_idx) =3D=3D 0) { + evlist_cpu_iterator__exit(&evlist_cpu_itr); break; - + } open_err =3D errno; - if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY) + if (stat_handle_error(counter, open_err) !=3D COUNTER_RETRY) { + evlist_cpu_iterator__exit(&evlist_cpu_itr); break; + } } } } - affinity__cleanup(affinity); - affinity =3D NULL; =20 has_supported_counters =3D false; evlist__for_each_entry(evsel_list, counter) { @@ -1066,7 +1047,6 @@ static int __run_perf_stat(int argc, const char **arg= v, int run_idx) if (forks) evlist__cancel_workload(evsel_list); =20 - affinity__cleanup(affinity); return err; } =20 diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index bb042d89e6a0..d62b8bab8fa4 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -359,36 +359,111 @@ int evlist__add_newtp(struct evlist *evlist, const c= har *sys, const char *name, } #endif =20 -struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct= affinity *affinity) +/* + * Should sched_setaffinity be used with evlist__for_each_cpu? Determine if + * migrating the thread will avoid possibly numerous IPIs. + */ +static bool evlist__use_affinity(struct evlist *evlist) +{ + struct evsel *pos; + struct perf_cpu_map *used_cpus =3D NULL; + bool ret =3D false; + + /* + * With perf record core.user_requested_cpus is usually NULL. + * Use the old method to handle this for now. + */ + if (!evlist->core.user_requested_cpus || + cpu_map__is_dummy(evlist->core.user_requested_cpus)) + return false; + + evlist__for_each_entry(evlist, pos) { + struct perf_cpu_map *intersect; + + if (!perf_pmu__benefits_from_affinity(pos->pmu)) + continue; + + if (evsel__is_dummy_event(pos)) { + /* + * The dummy event is opened on all CPUs so assume >1 + * event with shared CPUs. + */ + ret =3D true; + break; + } + if (evsel__is_retire_lat(pos)) { + /* + * Retirement latency events are similar to tool ones in + * their implementation, and so don't require affinity. + */ + continue; + } + if (perf_cpu_map__is_empty(used_cpus)) { + /* First benefitting event, we want >1 on a common CPU. */ + used_cpus =3D perf_cpu_map__get(pos->core.cpus); + continue; + } + if ((pos->core.attr.read_format & PERF_FORMAT_GROUP) && + evsel__leader(pos) !=3D pos) { + /* Skip members of the same sample group. */ + continue; + } + intersect =3D perf_cpu_map__intersect(used_cpus, pos->core.cpus); + if (!perf_cpu_map__is_empty(intersect)) { + /* >1 event with shared CPUs. */ + perf_cpu_map__put(intersect); + ret =3D true; + break; + } + perf_cpu_map__put(intersect); + perf_cpu_map__merge(&used_cpus, pos->core.cpus); + } + perf_cpu_map__put(used_cpus); + return ret; +} + +void evlist_cpu_iterator__init(struct evlist_cpu_iterator *itr, struct evl= ist *evlist) { - struct evlist_cpu_iterator itr =3D { + *itr =3D (struct evlist_cpu_iterator){ .container =3D evlist, .evsel =3D NULL, .cpu_map_idx =3D 0, .evlist_cpu_map_idx =3D 0, .evlist_cpu_map_nr =3D perf_cpu_map__nr(evlist->core.all_cpus), .cpu =3D (struct perf_cpu){ .cpu =3D -1}, - .affinity =3D affinity, + .affinity =3D NULL, }; =20 if (evlist__empty(evlist)) { /* Ensure the empty list doesn't iterate. */ - itr.evlist_cpu_map_idx =3D itr.evlist_cpu_map_nr; - } else { - itr.evsel =3D evlist__first(evlist); - if (itr.affinity) { - itr.cpu =3D perf_cpu_map__cpu(evlist->core.all_cpus, 0); - affinity__set(itr.affinity, itr.cpu.cpu); - itr.cpu_map_idx =3D perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu); - /* - * If this CPU isn't in the evsel's cpu map then advance - * through the list. - */ - if (itr.cpu_map_idx =3D=3D -1) - evlist_cpu_iterator__next(&itr); - } + itr->evlist_cpu_map_idx =3D itr->evlist_cpu_map_nr; + return; } - return itr; + + if (evlist__use_affinity(evlist)) { + if (affinity__setup(&itr->saved_affinity) =3D=3D 0) + itr->affinity =3D &itr->saved_affinity; + } + itr->evsel =3D evlist__first(evlist); + itr->cpu =3D perf_cpu_map__cpu(evlist->core.all_cpus, 0); + if (itr->affinity) + affinity__set(itr->affinity, itr->cpu.cpu); + itr->cpu_map_idx =3D perf_cpu_map__idx(itr->evsel->core.cpus, itr->cpu); + /* + * If this CPU isn't in the evsel's cpu map then advance + * through the list. + */ + if (itr->cpu_map_idx =3D=3D -1) + evlist_cpu_iterator__next(itr); +} + +void evlist_cpu_iterator__exit(struct evlist_cpu_iterator *itr) +{ + if (!itr->affinity) + return; + + affinity__cleanup(itr->affinity); + itr->affinity =3D NULL; } =20 void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr) @@ -418,14 +493,11 @@ void evlist_cpu_iterator__next(struct evlist_cpu_iter= ator *evlist_cpu_itr) */ if (evlist_cpu_itr->cpu_map_idx =3D=3D -1) evlist_cpu_iterator__next(evlist_cpu_itr); + } else { + evlist_cpu_iterator__exit(evlist_cpu_itr); } } =20 -bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu= _itr) -{ - return evlist_cpu_itr->evlist_cpu_map_idx >=3D evlist_cpu_itr->evlist_cpu= _map_nr; -} - static int evsel__strcmp(struct evsel *pos, char *evsel_name) { if (!evsel_name) @@ -453,19 +525,11 @@ static void __evlist__disable(struct evlist *evlist, = char *evsel_name, bool excl { struct evsel *pos; struct evlist_cpu_iterator evlist_cpu_itr; - struct affinity saved_affinity, *affinity =3D NULL; bool has_imm =3D false; =20 - // See explanation in evlist__close() - if (!cpu_map__is_dummy(evlist->core.user_requested_cpus)) { - if (affinity__setup(&saved_affinity) < 0) - return; - affinity =3D &saved_affinity; - } - /* Disable 'immediate' events last */ for (int imm =3D 0; imm <=3D 1; imm++) { - evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evlist) { pos =3D evlist_cpu_itr.evsel; if (evsel__strcmp(pos, evsel_name)) continue; @@ -483,7 +547,6 @@ static void __evlist__disable(struct evlist *evlist, ch= ar *evsel_name, bool excl break; } =20 - affinity__cleanup(affinity); evlist__for_each_entry(evlist, pos) { if (evsel__strcmp(pos, evsel_name)) continue; @@ -523,16 +586,8 @@ static void __evlist__enable(struct evlist *evlist, ch= ar *evsel_name, bool excl_ { struct evsel *pos; struct evlist_cpu_iterator evlist_cpu_itr; - struct affinity saved_affinity, *affinity =3D NULL; =20 - // See explanation in evlist__close() - if (!cpu_map__is_dummy(evlist->core.user_requested_cpus)) { - if (affinity__setup(&saved_affinity) < 0) - return; - affinity =3D &saved_affinity; - } - - evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evlist) { pos =3D evlist_cpu_itr.evsel; if (evsel__strcmp(pos, evsel_name)) continue; @@ -542,7 +597,6 @@ static void __evlist__enable(struct evlist *evlist, cha= r *evsel_name, bool excl_ continue; evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx); } - affinity__cleanup(affinity); evlist__for_each_entry(evlist, pos) { if (evsel__strcmp(pos, evsel_name)) continue; @@ -1339,30 +1393,14 @@ void evlist__close(struct evlist *evlist) { struct evsel *evsel; struct evlist_cpu_iterator evlist_cpu_itr; - struct affinity affinity; - - /* - * With perf record core.user_requested_cpus is usually NULL. - * Use the old method to handle this for now. - */ - if (!evlist->core.user_requested_cpus || - cpu_map__is_dummy(evlist->core.user_requested_cpus)) { - evlist__for_each_entry_reverse(evlist, evsel) - evsel__close(evsel); - return; - } - - if (affinity__setup(&affinity) < 0) - return; =20 - evlist__for_each_cpu(evlist_cpu_itr, evlist, &affinity) { + evlist__for_each_cpu(evlist_cpu_itr, evlist) { if (evlist_cpu_itr.cpu_map_idx =3D=3D 0 && evsel__is_retire_lat(evlist_c= pu_itr.evsel)) evsel__tpebs_close(evlist_cpu_itr.evsel); perf_evsel__close_cpu(&evlist_cpu_itr.evsel->core, evlist_cpu_itr.cpu_map_idx); } =20 - affinity__cleanup(&affinity); evlist__for_each_entry_reverse(evlist, evsel) { perf_evsel__free_fd(&evsel->core); perf_evsel__free_id(&evsel->core); diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 911834ae7c2a..30dff7484d3c 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -10,6 +10,7 @@ #include #include #include +#include "affinity.h" #include "events_stats.h" #include "evsel.h" #include "rblist.h" @@ -363,6 +364,8 @@ struct evlist_cpu_iterator { struct perf_cpu cpu; /** If present, used to set the affinity when switching between CPUs. */ struct affinity *affinity; + /** Maybe be used to hold affinity state prior to iterating. */ + struct affinity saved_affinity; }; =20 /** @@ -370,22 +373,31 @@ struct evlist_cpu_iterator { * affinity, iterate over all CPUs and then the evl= ist * for each evsel on that CPU. When switching betwe= en * CPUs the affinity is set to the CPU to avoid IPIs - * during syscalls. + * during syscalls. The affinity is set up and remo= ved + * automatically, if the loop is broken a call to + * evlist_cpu_iterator__exit is necessary. * @evlist_cpu_itr: the iterator instance. * @evlist: evlist instance to iterate. - * @affinity: NULL or used to set the affinity to the current CPU. */ -#define evlist__for_each_cpu(evlist_cpu_itr, evlist, affinity) \ - for ((evlist_cpu_itr) =3D evlist__cpu_begin(evlist, affinity); \ +#define evlist__for_each_cpu(evlist_cpu_itr, evlist) \ + for (evlist_cpu_iterator__init(&(evlist_cpu_itr), evlist); \ !evlist_cpu_iterator__end(&evlist_cpu_itr); \ evlist_cpu_iterator__next(&evlist_cpu_itr)) =20 -/** Returns an iterator set to the first CPU/evsel of evlist. */ -struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct= affinity *affinity); +/** Setup an iterator set to the first CPU/evsel of evlist. */ +void evlist_cpu_iterator__init(struct evlist_cpu_iterator *itr, struct evl= ist *evlist); +/** + * Cleans up the iterator, automatically done by evlist_cpu_iterator__next= when + * the end of the list is reached. Multiple calls are safe. + */ +void evlist_cpu_iterator__exit(struct evlist_cpu_iterator *itr); /** Move to next element in iterator, updating CPU, evsel and the affinity= . */ void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr); /** Returns true when iterator is at the end of the CPUs and evlist. */ -bool evlist_cpu_iterator__end(const struct evlist_cpu_iterator *evlist_cpu= _itr); +static inline bool evlist_cpu_iterator__end(const struct evlist_cpu_iterat= or *evlist_cpu_itr) +{ + return evlist_cpu_itr->evlist_cpu_map_idx >=3D evlist_cpu_itr->evlist_cpu= _map_nr; +} =20 struct evsel *evlist__get_tracking_event(struct evlist *evlist); void evlist__set_tracking_event(struct evlist *evlist, struct evsel *track= ing_evsel); diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 956ea273c2c7..853b8addead6 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -2420,6 +2420,18 @@ bool perf_pmu__is_software(const struct perf_pmu *pm= u) return false; } =20 +bool perf_pmu__benefits_from_affinity(struct perf_pmu *pmu) +{ + if (!pmu) + return true; /* Assume is core. */ + + /* + * All perf event PMUs should benefit from accessing the perf event + * contexts on the local CPU. + */ + return pmu->type <=3D PERF_PMU_TYPE_PE_END; +} + FILE *perf_pmu__open_file(const struct perf_pmu *pmu, const char *name) { char path[PATH_MAX]; diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 8f11bfe8ed6d..689542581429 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -273,6 +273,7 @@ bool perf_pmu__name_no_suffix_match(const struct perf_p= mu *pmu, const char *to_m * perf_sw_context in the kernel? */ bool perf_pmu__is_software(const struct perf_pmu *pmu); +bool perf_pmu__benefits_from_affinity(struct perf_pmu *pmu); =20 FILE *perf_pmu__open_file(const struct perf_pmu *pmu, const char *name); FILE *perf_pmu__open_file_at(const struct perf_pmu *pmu, int dirfd, const = char *name); --=20 2.52.0.457.g6b5491de43-goog From nobody Sun Feb 8 11:25:56 2026 Received: from mail-dl1-f73.google.com (mail-dl1-f73.google.com [74.125.82.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B32F33B6EB for ; Thu, 8 Jan 2026 21:27:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907642; cv=none; b=dI8HEI3BAH4o/Q4ks9ki5IkgLZ+x/w/9TJqJWIm/YQ5GKv60uJnXCOHJVKiDNZ8VNKUNIuXdNYnzm9x+/9NtzR2TCCbfA+gDJUhisKs2j8ehZn88eGtBVbowQe8CRd8qHzZbeAWnO9y9qiTCHAJUqplijxrr51P5kNJ9tkGZT/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767907642; c=relaxed/simple; bh=9QSZcQcj06aGsLJctNtUugZ9Hfk2wEdVuIqDA19JJe4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Content-Type; b=SWfTqESy8weVhISBoNqdk5nDCczCeVFYTAgvzWM2JmtoCogCocCdJ3cDMKGCsFKwbqy5WYLy8+3RB/5K9NXQeQBAyUnnLejhjyhAlBYtbp4zRkvq5pRQAHk2dxvvNvkkx0DF1jrKB+yMi+b+OvjOmbRSsUe574FgZ/O7utHoTJw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=OOROiPIN; arc=none smtp.client-ip=74.125.82.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--irogers.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OOROiPIN" Received: by mail-dl1-f73.google.com with SMTP id a92af1059eb24-11f3b54cfdeso1210454c88.1 for ; Thu, 08 Jan 2026 13:27:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1767907640; x=1768512440; darn=vger.kernel.org; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=EuVfDturXVb+EuYCA3jpCBxVlvPd/xtKWCXz7A8dJxc=; b=OOROiPIN2YvRtmfezWQlxOy+wJgLdm7N+d/c1oBUfPQAITExyhuchLdCrlCqVJMrV2 1+7Qcyby3S312FdGy1ECrGqg21d1hEb7+yva4xWhFJcvgXsF+VEJGoRV5UaDP7f5f5c5 KTF9bV9oUrdKxZn+eKhAnkTeCKMvQ/e7kQhI+wGm4e/MvDc5lfoCSrcfiUqbzvB96Lc+ leOFlqEt8afnJX6ghK4EaEo6KzevUEwEiyxs1CgUawK9dpLEMq5nL5oCFcOa8UJG/3mH dnlyyUvfLJqVOa+lTNet1/X7IF1VaV2wwEwZpIVMBxPxjLbi5rc1yl8Zy83Y8MQ4L4q6 uLtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767907640; x=1768512440; h=to:from:subject:message-id:references:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EuVfDturXVb+EuYCA3jpCBxVlvPd/xtKWCXz7A8dJxc=; b=N66mAUNLVZsrxkq5w994WLtKr8k0Sl8o6sFblyCRBaklvnFXca+6RWihD2StI8OOOu 8s47JwMk6A+Fm/vBX1bHxIjYa5CJtt8DyTgKxkjC+2IDDFE4+Is3cC90a2OD90bsD3WK dZQr6v6EoH29a1qXIwNDwshgo4dY3rcX/7StiiGwbEsqSUXQSb7KypKdIDQUAIMmA3Bc cO4IoBOAgkzoBXfo3RsphTO6DOjvJrIKPO3kwj8DdJIHirR1yzcRrztJdNgUl9p8Fokl blX3w8DJJtNOCAf97baxeWarKa4ttljv0TQ2w0HGGw3or8RlIHrlh6HPuelVA6cyIkU2 Y7EQ== X-Forwarded-Encrypted: i=1; AJvYcCX5vngqzR+M63i3GHq/ycP/xU1P5x44/WfdL9KrwQwysehlgemUd11CuTvH0x3ymrY8DcouFEr564asUNE=@vger.kernel.org X-Gm-Message-State: AOJu0Yztiq80xZhmT4iQPJOzdcuDhMMZf/8TtQuT2jSMEjAm9hgemyuA tkMNdOxFPq/8uUMA3het3COOsgIzrpKCA/SabRNxEwSJ4l2T/KiAgfbDgeQRz4pu4L17hY1aDu6 7QAhf2fktkw== X-Google-Smtp-Source: AGHT+IH3VQ58yS97Y3k2iUcc6x+Zh0qVcQROKjoenZAcLDLOMeqvPg90KvCalXFHU5GEJr1mmtjxce5RhVXs X-Received: from dlee30.prod.google.com ([2002:a05:7022:629e:b0:121:a019:8a44]) (user=irogers job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7022:799:b0:11b:7dcd:ca9a with SMTP id a92af1059eb24-121f8b8b46emr6739841c88.34.1767907640181; Thu, 08 Jan 2026 13:27:20 -0800 (PST) Date: Thu, 8 Jan 2026 13:26:52 -0800 In-Reply-To: <20260108212652.768875-1-irogers@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260108212652.768875-1-irogers@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260108212652.768875-4-irogers@google.com> Subject: [PATCH v6 3/3] perf stat: Add no-affinity flag From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Dr. David Alan Gilbert" , Yang Li , James Clark , Thomas Falcon , Thomas Richter , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, Andi Kleen , Dapeng Mi Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add flag that disables affinity behavior. Using sched_setaffinity to place a perf thread on a CPU can avoid certain interprocessor interrupts but may introduce a delay due to the scheduling, particularly on loaded machines. Add a command line option to disable the behavior. This behavior is less present in other tools like `perf record`, as it uses a ring buffer and doesn't make repeated system calls. Signed-off-by: Ian Rogers --- tools/perf/Documentation/perf-stat.txt | 4 ++++ tools/perf/builtin-stat.c | 6 ++++++ tools/perf/util/evlist.c | 6 +----- tools/perf/util/evlist.h | 1 + 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentat= ion/perf-stat.txt index 1a766d4a2233..1ffb510606af 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -382,6 +382,10 @@ color the metric's computed value. Don't print output, warnings or messages. This is useful with perf stat record below to only write data to the perf.data file. =20 +--no-affinity:: +Don't change scheduler affinities when iterating over CPUs. Disables +an optimization aimed at minimizing interprocessor interrupts. + STAT RECORD ----------- Stores stat data into perf data file. diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index bb14268e7393..ddda0ea62eaf 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2427,6 +2427,7 @@ static int parse_tpebs_mode(const struct option *opt,= const char *str, int cmd_stat(int argc, const char **argv) { struct opt_aggr_mode opt_mode =3D {}; + bool affinity =3D true, affinity_set =3D false; struct option stat_options[] =3D { OPT_BOOLEAN('T', "transaction", &transaction_run, "hardware transaction statistics"), @@ -2555,6 +2556,8 @@ int cmd_stat(int argc, const char **argv) "don't print 'summary' for CSV summary output"), OPT_BOOLEAN(0, "quiet", &quiet, "don't print any output, messages or warnings (useful with record)"), + OPT_BOOLEAN_SET(0, "affinity", &affinity, &affinity_set, + "don't allow affinity optimizations aimed at reducing IPIs"), OPT_CALLBACK(0, "cputype", &evsel_list, "hybrid cpu type", "Only enable events on applying cpu with this type " "for hybrid platform (e.g. core or atom)", @@ -2612,6 +2615,9 @@ int cmd_stat(int argc, const char **argv) } else stat_config.csv_sep =3D DEFAULT_SEPARATOR; =20 + if (affinity_set) + evsel_list->no_affinity =3D !affinity; + if (argc && strlen(argv[0]) > 2 && strstarts("record", argv[0])) { argc =3D __cmd_record(stat_options, &opt_mode, argc, argv); if (argc < 0) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index d62b8bab8fa4..00fb3cf45bae 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -369,11 +369,7 @@ static bool evlist__use_affinity(struct evlist *evlist) struct perf_cpu_map *used_cpus =3D NULL; bool ret =3D false; =20 - /* - * With perf record core.user_requested_cpus is usually NULL. - * Use the old method to handle this for now. - */ - if (!evlist->core.user_requested_cpus || + if (evlist->no_affinity || !evlist->core.user_requested_cpus || cpu_map__is_dummy(evlist->core.user_requested_cpus)) return false; =20 diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 30dff7484d3c..d17c3b57a409 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -59,6 +59,7 @@ struct event_enable_timer; struct evlist { struct perf_evlist core; bool enabled; + bool no_affinity; int id_pos; int is_pos; int nr_br_cntr; --=20 2.52.0.457.g6b5491de43-goog