Perf stat with no arguments will use default events and metrics. These
events may fail to open even with kernel and hypervisor disabled. When
these fail then the permissions error appears even though they were
implicitly selected. This is particularly a problem with the automatic
selection of the TopdownL1 metric group on certain architectures like
Skylake:
'''
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
'''
This patch adds skippable evsels that when they fail to open won't
cause termination and will appear as "<not supported>" in output. The
TopdownL1 events, from the metric group, are marked as skippable. This
turns the failure above to:
'''
$ perf stat perf bench internals synthesize
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 49.287 usec (+- 0.083 usec)
Average num. events: 3.000 (+- 0.000)
Average time per event 16.429 usec
Average data synthesis took: 49.641 usec (+- 0.085 usec)
Average num. events: 11.000 (+- 0.000)
Average time per event 4.513 usec
Performance counter stats for 'perf bench internals synthesize':
1,222.38 msec task-clock:u # 0.993 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
162 page-faults:u # 132.529 /sec
774,445,184 cycles:u # 0.634 GHz (49.61%)
1,640,969,811 instructions:u # 2.12 insn per cycle (59.67%)
302,052,148 branches:u # 247.102 M/sec (59.69%)
1,807,718 branch-misses:u # 0.60% of all branches (59.68%)
5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/sec
# 17.3 % tma_frontend_bound
# 56.4 % tma_retiring
# nan % tma_backend_bound
# nan % tma_bad_speculation (60.01%)
536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/sec (60.33%)
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274 M/sec (40.31%)
774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/sec (50.34%)
1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/sec (50.12%)
1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/sec (49.70%)
1.231055525 seconds time elapsed
0.258327000 seconds user
0.965749000 seconds sys
'''
The event INT_MISC.RECOVERY_CYCLES_ANY:u is skipped as it can't be
opened with paranoia 2 on Skylake. With a lower paranoia, or as root,
all events/metrics are computed.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 38 +++++++++++++++++++++++++++++---------
tools/perf/util/evsel.c | 15 +++++++++++++--
tools/perf/util/evsel.h | 1 +
3 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index be9677aa642f..ffb47b166098 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -667,6 +667,13 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
evsel_list->core.threads->err_thread = -1;
return COUNTER_RETRY;
}
+ } else if (counter->skippable) {
+ if (verbose > 0)
+ ui__warning("skipping event %s that kernel failed to open .\n",
+ evsel__name(counter));
+ counter->supported = false;
+ counter->errored = true;
+ return COUNTER_SKIP;
}
evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
@@ -1890,15 +1897,28 @@ static int add_default_attributes(void)
* caused by exposing latent bugs. This is fixed properly in:
* https://lore.kernel.org/lkml/bff481ba-e60a-763f-0aa0-3ee53302c480@linux.intel.com/
*/
- if (metricgroup__has_metric("TopdownL1") && !perf_pmu__has_hybrid() &&
- metricgroup__parse_groups(evsel_list, "TopdownL1",
- /*metric_no_group=*/false,
- /*metric_no_merge=*/false,
- /*metric_no_threshold=*/true,
- stat_config.user_requested_cpu_list,
- stat_config.system_wide,
- &stat_config.metric_events) < 0)
- return -1;
+ if (metricgroup__has_metric("TopdownL1") && !perf_pmu__has_hybrid()) {
+ struct evlist *metric_evlist = evlist__new();
+ struct evsel *metric_evsel;
+
+ if (!metric_evlist)
+ return -1;
+
+ if (metricgroup__parse_groups(metric_evlist, "TopdownL1",
+ /*metric_no_group=*/false,
+ /*metric_no_merge=*/false,
+ /*metric_no_threshold=*/true,
+ stat_config.user_requested_cpu_list,
+ stat_config.system_wide,
+ &stat_config.metric_events) < 0)
+ return -1;
+
+ evlist__for_each_entry(metric_evlist, metric_evsel) {
+ metric_evsel->skippable = true;
+ }
+ evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries);
+ evlist__delete(metric_evlist);
+ }
/* Platform specific attrs */
if (evlist__add_default_attrs(evsel_list, default_null_attrs) < 0)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 356c07f03be6..1cd04b5998d2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -290,6 +290,7 @@ void evsel__init(struct evsel *evsel,
evsel->per_pkg_mask = NULL;
evsel->collect_stat = false;
evsel->pmu_name = NULL;
+ evsel->skippable = false;
}
struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -1725,9 +1726,13 @@ static int get_group_fd(struct evsel *evsel, int cpu_map_idx, int thread)
return -1;
fd = FD(leader, cpu_map_idx, thread);
- BUG_ON(fd == -1);
+ BUG_ON(fd == -1 && !leader->skippable);
- return fd;
+ /*
+ * When the leader has been skipped, return -2 to distinguish from no
+ * group leader case.
+ */
+ return fd == -1 ? -2 : fd;
}
static void evsel__remove_fd(struct evsel *pos, int nr_cpus, int nr_threads, int thread_idx)
@@ -2109,6 +2114,12 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
group_fd = get_group_fd(evsel, idx, thread);
+ if (group_fd == -2) {
+ pr_debug("broken group leader for %s\n", evsel->name);
+ err = -EINVAL;
+ goto out_close;
+ }
+
test_attr__ready();
/* Debug message used by test scripts */
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 35805dcdb1b9..bf8f01af1c0b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -95,6 +95,7 @@ struct evsel {
bool weak_group;
bool bpf_counter;
bool use_config_name;
+ bool skippable;
int bpf_fd;
struct bpf_object *bpf_obj;
struct list_head config_terms;
--
2.40.1.495.gc816e09b53d-goog
On 2023-04-29 1:34 a.m., Ian Rogers wrote:
> Perf stat with no arguments will use default events and metrics. These
> events may fail to open even with kernel and hypervisor disabled. When
> these fail then the permissions error appears even though they were
> implicitly selected. This is particularly a problem with the automatic
> selection of the TopdownL1 metric group on certain architectures like
> Skylake:
>
> '''
> $ perf stat true
> Error:
> Access to performance monitoring and observability operations is limited.
> Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> access to performance monitoring and observability operations for processes
> without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> More information can be found at 'Perf events and tool security' document:
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> perf_event_paranoid setting is 2:
> -1: Allow use of (almost) all events by all users
> Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>> = 0: Disallow raw and ftrace function tracepoint access
>> = 1: Disallow CPU event access
>> = 2: Disallow kernel profiling
> To make the adjusted perf_event_paranoid setting permanent preserve it
> in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> '''
>
> This patch adds skippable evsels that when they fail to open won't
> cause termination and will appear as "<not supported>" in output. The
> TopdownL1 events, from the metric group, are marked as skippable. This
> turns the failure above to:
>
> '''
> $ perf stat perf bench internals synthesize
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
> Average synthesis took: 49.287 usec (+- 0.083 usec)
> Average num. events: 3.000 (+- 0.000)
> Average time per event 16.429 usec
> Average data synthesis took: 49.641 usec (+- 0.085 usec)
> Average num. events: 11.000 (+- 0.000)
> Average time per event 4.513 usec
>
> Performance counter stats for 'perf bench internals synthesize':
>
> 1,222.38 msec task-clock:u # 0.993 CPUs utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 162 page-faults:u # 132.529 /sec
> 774,445,184 cycles:u # 0.634 GHz (49.61%)
> 1,640,969,811 instructions:u # 2.12 insn per cycle (59.67%)
> 302,052,148 branches:u # 247.102 M/sec (59.69%)
> 1,807,718 branch-misses:u # 0.60% of all branches (59.68%)
> 5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/sec
> # 17.3 % tma_frontend_bound
> # 56.4 % tma_retiring
> # nan % tma_backend_bound
> # nan % tma_bad_speculation (60.01%)
> 536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/sec (60.33%)
> <not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
> 5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274 M/sec (40.31%)
> 774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/sec (50.34%)
> 1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/sec (50.12%)
> 1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/sec (49.70%)
>
> 1.231055525 seconds time elapsed
>
> 0.258327000 seconds user
> 0.965749000 seconds sys
Which branch is this patch series based on?
I still cannot get the same output as the examples.
I'm using the latest perf-tools-next (The latest commit ID is
5d27a645f609 ("perf tracepoint: Fix memory leak in is_valid_tracepoint()")).
I only applied patch 2 and patch 3, since the patch 1 is already merged.
It's a single socket Cascade Lake. with kernel 5.19-8.
$ uname -r
5.19.8-100.fc35.x86_64
As you can see, all the topdown related events are displayed twice.
With root permission,
$ sudo ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 91.487 usec (+- 0.050 usec)
Average num. events: 47.000 (+- 0.000)
Average time per event 1.947 usec
Average data synthesis took: 97.720 usec (+- 0.059 usec)
Average num. events: 245.000 (+- 0.000)
Average time per event 0.399 usec
Performance counter stats for 'perf bench internals synthesize':
2,077.81 msec task-clock # 0.998 CPUs
utilized
466 context-switches # 224.274 /sec
4 cpu-migrations # 1.925 /sec
775 page-faults # 372.988 /sec
9,561,957,326 cycles # 4.602 GHz
(31.17%)
24,466,854,021 instructions # 2.56 insn
per cycle (37.42%)
5,547,892,196 branches # 2.670
G/sec (37.48%)
37,880,526 branch-misses # 0.68% of
all branches (37.52%)
49,576,109 CPU_CLK_UNHALTED.REF_XCLK # 23.860 M/sec
# 59.9 % tma_retiring
# 4.6 %
tma_bad_speculation (37.47%)
228,406,003 INT_MISC.RECOVERY_CYCLES_ANY # 109.926
M/sec (37.52%)
49,591,815 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.867
M/sec (24.99%)
9,553,472,893 CPU_CLK_UNHALTED.THREAD # 4.598
G/sec (31.25%)
22,893,372,651 UOPS_RETIRED.RETIRE_SLOTS # 11.018
G/sec (31.23%)
24,180,375,299 UOPS_ISSUED.ANY # 11.637
G/sec (31.25%)
49,562,300 CPU_CLK_UNHALTED.REF_XCLK # 23.853 M/sec
# 28.1 %
tma_frontend_bound
# 7.2 %
tma_backend_bound (31.24%)
10,735,205,084 IDQ_UOPS_NOT_DELIVERED.CORE # 5.167
G/sec (31.30%)
228,798,426 INT_MISC.RECOVERY_CYCLES_ANY # 110.115
M/sec (25.04%)
49,559,962 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.852
M/sec (25.00%)
9,538,354,333 CPU_CLK_UNHALTED.THREAD # 4.591
G/sec (31.29%)
24,207,967,071 UOPS_ISSUED.ANY # 11.651
G/sec (31.24%)
2.082670856 seconds time elapsed
0.812763000 seconds user
1.252387000 seconds sys
With non-root, nothing is counted for the topdownL1 events.
$ ./perf stat perf bench internals synthesize
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 91.852 usec (+- 0.139 usec)
Average num. events: 47.000 (+- 0.000)
Average time per event 1.954 usec
Average data synthesis took: 96.230 usec (+- 0.046 usec)
Average num. events: 245.000 (+- 0.000)
Average time per event 0.393 usec
Performance counter stats for 'perf bench internals synthesize':
2,051.95 msec task-clock:u # 0.997 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
765 page-faults:u # 372.816 /sec
3,601,662,523 cycles:u # 1.755 GHz
(16.72%)
9,241,811,003 instructions:u # 2.57 insn
per cycle (33.43%)
2,238,848,485 branches:u # 1.091
G/sec (50.06%)
19,966,181 branch-misses:u # 0.89% of
all branches (66.77%)
<not counted> CPU_CLK_UNHALTED.REF_XCLK:u
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
<not counted> CPU_CLK_UNHALTED.THREAD:u
<not counted> UOPS_RETIRED.RETIRE_SLOTS:u
<not counted> UOPS_ISSUED.ANY:u
<not counted> CPU_CLK_UNHALTED.REF_XCLK:u
<not counted> IDQ_UOPS_NOT_DELIVERED.CORE:u
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
<not counted> CPU_CLK_UNHALTED.THREAD:u
<not counted> UOPS_ISSUED.ANY:u
2.057691297 seconds time elapsed
0.766640000 seconds user
1.275170000 seconds sys
Thanks,
Kan
On Mon, May 1, 2023 at 7:56 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2023-04-29 1:34 a.m., Ian Rogers wrote:
> > Perf stat with no arguments will use default events and metrics. These
> > events may fail to open even with kernel and hypervisor disabled. When
> > these fail then the permissions error appears even though they were
> > implicitly selected. This is particularly a problem with the automatic
> > selection of the TopdownL1 metric group on certain architectures like
> > Skylake:
> >
> > '''
> > $ perf stat true
> > Error:
> > Access to performance monitoring and observability operations is limited.
> > Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
> > access to performance monitoring and observability operations for processes
> > without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
> > More information can be found at 'Perf events and tool security' document:
> > https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
> > perf_event_paranoid setting is 2:
> > -1: Allow use of (almost) all events by all users
> > Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> >> = 0: Disallow raw and ftrace function tracepoint access
> >> = 1: Disallow CPU event access
> >> = 2: Disallow kernel profiling
> > To make the adjusted perf_event_paranoid setting permanent preserve it
> > in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
> > '''
> >
> > This patch adds skippable evsels that when they fail to open won't
> > cause termination and will appear as "<not supported>" in output. The
> > TopdownL1 events, from the metric group, are marked as skippable. This
> > turns the failure above to:
> >
> > '''
> > $ perf stat perf bench internals synthesize
> > Computing performance of single threaded perf event synthesis by
> > synthesizing events on the perf process itself:
> > Average synthesis took: 49.287 usec (+- 0.083 usec)
> > Average num. events: 3.000 (+- 0.000)
> > Average time per event 16.429 usec
> > Average data synthesis took: 49.641 usec (+- 0.085 usec)
> > Average num. events: 11.000 (+- 0.000)
> > Average time per event 4.513 usec
> >
> > Performance counter stats for 'perf bench internals synthesize':
> >
> > 1,222.38 msec task-clock:u # 0.993 CPUs utilized
> > 0 context-switches:u # 0.000 /sec
> > 0 cpu-migrations:u # 0.000 /sec
> > 162 page-faults:u # 132.529 /sec
> > 774,445,184 cycles:u # 0.634 GHz (49.61%)
> > 1,640,969,811 instructions:u # 2.12 insn per cycle (59.67%)
> > 302,052,148 branches:u # 247.102 M/sec (59.69%)
> > 1,807,718 branch-misses:u # 0.60% of all branches (59.68%)
> > 5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/sec
> > # 17.3 % tma_frontend_bound
> > # 56.4 % tma_retiring
> > # nan % tma_backend_bound
> > # nan % tma_bad_speculation (60.01%)
> > 536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/sec (60.33%)
> > <not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
> > 5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274 M/sec (40.31%)
> > 774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/sec (50.34%)
> > 1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/sec (50.12%)
> > 1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/sec (49.70%)
> >
> > 1.231055525 seconds time elapsed
> >
> > 0.258327000 seconds user
> > 0.965749000 seconds sys
>
>
> Which branch is this patch series based on?
>
> I still cannot get the same output as the examples.
>
> I'm using the latest perf-tools-next (The latest commit ID is
> 5d27a645f609 ("perf tracepoint: Fix memory leak in is_valid_tracepoint()")).
> I only applied patch 2 and patch 3, since the patch 1 is already merged.
>
> It's a single socket Cascade Lake. with kernel 5.19-8.
> $ uname -r
> 5.19.8-100.fc35.x86_64
>
> As you can see, all the topdown related events are displayed twice.
>
> With root permission,
>
> $ sudo ./perf stat perf bench internals synthesize
> # Running 'internals/synthesize' benchmark:
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
> Average synthesis took: 91.487 usec (+- 0.050 usec)
> Average num. events: 47.000 (+- 0.000)
> Average time per event 1.947 usec
> Average data synthesis took: 97.720 usec (+- 0.059 usec)
> Average num. events: 245.000 (+- 0.000)
> Average time per event 0.399 usec
>
> Performance counter stats for 'perf bench internals synthesize':
>
> 2,077.81 msec task-clock # 0.998 CPUs
> utilized
> 466 context-switches # 224.274 /sec
> 4 cpu-migrations # 1.925 /sec
> 775 page-faults # 372.988 /sec
> 9,561,957,326 cycles # 4.602 GHz
> (31.17%)
> 24,466,854,021 instructions # 2.56 insn
> per cycle (37.42%)
> 5,547,892,196 branches # 2.670
> G/sec (37.48%)
> 37,880,526 branch-misses # 0.68% of
> all branches (37.52%)
> 49,576,109 CPU_CLK_UNHALTED.REF_XCLK # 23.860 M/sec
> # 59.9 % tma_retiring
> # 4.6 %
> tma_bad_speculation (37.47%)
> 228,406,003 INT_MISC.RECOVERY_CYCLES_ANY # 109.926
> M/sec (37.52%)
> 49,591,815 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.867
> M/sec (24.99%)
> 9,553,472,893 CPU_CLK_UNHALTED.THREAD # 4.598
> G/sec (31.25%)
> 22,893,372,651 UOPS_RETIRED.RETIRE_SLOTS # 11.018
> G/sec (31.23%)
> 24,180,375,299 UOPS_ISSUED.ANY # 11.637
> G/sec (31.25%)
> 49,562,300 CPU_CLK_UNHALTED.REF_XCLK # 23.853 M/sec
> # 28.1 %
> tma_frontend_bound
> # 7.2 %
> tma_backend_bound (31.24%)
> 10,735,205,084 IDQ_UOPS_NOT_DELIVERED.CORE # 5.167
> G/sec (31.30%)
> 228,798,426 INT_MISC.RECOVERY_CYCLES_ANY # 110.115
> M/sec (25.04%)
> 49,559,962 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE # 23.852
> M/sec (25.00%)
> 9,538,354,333 CPU_CLK_UNHALTED.THREAD # 4.591
> G/sec (31.29%)
> 24,207,967,071 UOPS_ISSUED.ANY # 11.651
> G/sec (31.24%)
>
> 2.082670856 seconds time elapsed
>
> 0.812763000 seconds user
> 1.252387000 seconds sys
The events are displayed twice as there are 2 groups of events. This
is changed by:
https://lore.kernel.org/lkml/20230429053506.1962559-5-irogers@google.com/
where the events are no longer grouped.
> With non-root, nothing is counted for the topdownL1 events.
>
> $ ./perf stat perf bench internals synthesize
> # Running 'internals/synthesize' benchmark:
> Computing performance of single threaded perf event synthesis by
> synthesizing events on the perf process itself:
> Average synthesis took: 91.852 usec (+- 0.139 usec)
> Average num. events: 47.000 (+- 0.000)
> Average time per event 1.954 usec
> Average data synthesis took: 96.230 usec (+- 0.046 usec)
> Average num. events: 245.000 (+- 0.000)
> Average time per event 0.393 usec
>
> Performance counter stats for 'perf bench internals synthesize':
>
> 2,051.95 msec task-clock:u # 0.997 CPUs
> utilized
> 0 context-switches:u # 0.000 /sec
> 0 cpu-migrations:u # 0.000 /sec
> 765 page-faults:u # 372.816 /sec
> 3,601,662,523 cycles:u # 1.755 GHz
> (16.72%)
> 9,241,811,003 instructions:u # 2.57 insn
> per cycle (33.43%)
> 2,238,848,485 branches:u # 1.091
> G/sec (50.06%)
> 19,966,181 branch-misses:u # 0.89% of
> all branches (66.77%)
> <not counted> CPU_CLK_UNHALTED.REF_XCLK:u
> <not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
> <not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
> <not counted> CPU_CLK_UNHALTED.THREAD:u
> <not counted> UOPS_RETIRED.RETIRE_SLOTS:u
> <not counted> UOPS_ISSUED.ANY:u
> <not counted> CPU_CLK_UNHALTED.REF_XCLK:u
> <not counted> IDQ_UOPS_NOT_DELIVERED.CORE:u
> <not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
> <not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u
> <not counted> CPU_CLK_UNHALTED.THREAD:u
> <not counted> UOPS_ISSUED.ANY:u
>
> 2.057691297 seconds time elapsed
>
> 0.766640000 seconds user
> 1.275170000 seconds sys
The reason nothing is counted is that all the metrics are trying to
share groups with the event INT_MISC.RECOVERY_CYCLES_ANY in them,
which means the whole group ends up being not supported. Again, the
patch above that removes the groups for TopdownL1 and TopdownL2, as
requested by Weilin, will address the issue.
Thanks,
Ian
> Thanks,
> Kan
>
On 2023-05-01 11:29 a.m., Ian Rogers wrote:
> The events are displayed twice as there are 2 groups of events. This
> is changed by:
> https://lore.kernel.org/lkml/20230429053506.1962559-5-irogers@google.com/
> where the events are no longer grouped.
The trick seems don't work on a hybrid machine. I still got the
duplicate Topdown events on e-core.
38,841.16 msec cpu-clock # 32.009 CPUs
utilized
256 context-switches # 6.591 /sec
33 cpu-migrations # 0.850 /sec
84 page-faults # 2.163 /sec
21,910,584 cpu_core/cycles/ # 564.107 K/sec
248,153,249 cpu_atom/cycles/ # 6.389
M/sec (53.85%)
27,463,908 cpu_core/instructions/ # 707.083 K/sec
118,661,014 cpu_atom/instructions/ # 3.055
M/sec (63.06%)
4,652,941 cpu_core/branches/ # 119.794 K/sec
20,173,082 cpu_atom/branches/ # 519.374
K/sec (63.18%)
72,727 cpu_core/branch-misses/ # 1.872 K/sec
1,143,187 cpu_atom/branch-misses/ # 29.432
K/sec (63.51%)
125,630,586 cpu_core/TOPDOWN.SLOTS/ # nan %
tma_backend_bound
# nan % tma_retiring
# 0.0 %
tma_bad_speculation
# nan %
tma_frontend_bound
30,254,701 cpu_core/topdown-retiring/
149,075,726 cpu_atom/TOPDOWN_RETIRING.ALL/ # 3.838 M/sec
# 14.8 %
tma_bad_speculation (63.82%)
<not supported> cpu_core/topdown-bad-spec/
523,614,383 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 13.481 M/sec
# 42.0 %
tma_frontend_bound (64.15%)
385,502,477 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.925 M/sec
# 30.9 %
tma_backend_bound
# 30.9 %
tma_backend_bound_aux (64.39%)
249,534,488 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 6.424 M/sec
# 12.2 %
tma_retiring (64.18%)
151,729,465 cpu_atom/TOPDOWN_RETIRING.ALL/ # 3.906
M/sec (54.67%)
530,621,769 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 13.661
M/sec (54.30%)
<not supported> cpu_core/topdown-fe-bound/
383,694,745 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.879
M/sec (53.96%)
<not supported> cpu_core/topdown-be-bound/
105,850 cpu_core/INT_MISC.UOP_DROPPING/ # 2.725 K/sec
1.213449538 seconds time elapsed
Thanks,
Kan
On Mon, May 1, 2023 at 1:25 PM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2023-05-01 11:29 a.m., Ian Rogers wrote: > > The events are displayed twice as there are 2 groups of events. This > > is changed by: > > https://lore.kernel.org/lkml/20230429053506.1962559-5-irogers@google.com/ > > where the events are no longer grouped. > > The trick seems don't work on a hybrid machine. I still got the > duplicate Topdown events on e-core. For hybrid the rest of the patch series is necessary, ie the patches beyond what's for 6.4, which I take from the output (ie not a crash) you are looking at. As multiple groups are in play then it looks like the atom events are on >1 PMU which can happen as the x86 code special cases events with topdown in their name. Some fixes in the series for this are: https://lore.kernel.org/lkml/20230429053506.1962559-6-irogers@google.com/ https://lore.kernel.org/lkml/20230429053506.1962559-40-irogers@google.com/ and related: https://lore.kernel.org/lkml/20230429053506.1962559-19-irogers@google.com/ and so fixing this requires some detective work. I don't think it should be a requirement for the series that all hybrid bugs are fixed - especially given the complaints against the length of the series as-is. Thanks, Ian > 38,841.16 msec cpu-clock # 32.009 CPUs > utilized > 256 context-switches # 6.591 /sec > 33 cpu-migrations # 0.850 /sec > 84 page-faults # 2.163 /sec > 21,910,584 cpu_core/cycles/ # 564.107 K/sec > 248,153,249 cpu_atom/cycles/ # 6.389 > M/sec (53.85%) > 27,463,908 cpu_core/instructions/ # 707.083 K/sec > 118,661,014 cpu_atom/instructions/ # 3.055 > M/sec (63.06%) > 4,652,941 cpu_core/branches/ # 119.794 K/sec > 20,173,082 cpu_atom/branches/ # 519.374 > K/sec (63.18%) > 72,727 cpu_core/branch-misses/ # 1.872 K/sec > 1,143,187 cpu_atom/branch-misses/ # 29.432 > K/sec (63.51%) > 125,630,586 cpu_core/TOPDOWN.SLOTS/ # nan % > tma_backend_bound > # nan % tma_retiring > # 0.0 % > tma_bad_speculation > # nan % > tma_frontend_bound > 30,254,701 cpu_core/topdown-retiring/ > 149,075,726 cpu_atom/TOPDOWN_RETIRING.ALL/ # 3.838 M/sec > # 14.8 % > tma_bad_speculation (63.82%) > <not supported> cpu_core/topdown-bad-spec/ > 523,614,383 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 13.481 M/sec > # 42.0 % > tma_frontend_bound (64.15%) > 385,502,477 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.925 M/sec > # 30.9 % > tma_backend_bound > # 30.9 % > tma_backend_bound_aux (64.39%) > 249,534,488 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 6.424 M/sec > # 12.2 % > tma_retiring (64.18%) > 151,729,465 cpu_atom/TOPDOWN_RETIRING.ALL/ # 3.906 > M/sec (54.67%) > 530,621,769 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 13.661 > M/sec (54.30%) > <not supported> cpu_core/topdown-fe-bound/ > 383,694,745 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.879 > M/sec (53.96%) > <not supported> cpu_core/topdown-be-bound/ > 105,850 cpu_core/INT_MISC.UOP_DROPPING/ # 2.725 K/sec > > 1.213449538 seconds time elapsed > > Thanks, > Kan
On 2023-05-01 4:48 p.m., Ian Rogers wrote: > On Mon, May 1, 2023 at 1:25 PM Liang, Kan <kan.liang@linux.intel.com> wrote: >> >> >> On 2023-05-01 11:29 a.m., Ian Rogers wrote: >>> The events are displayed twice as there are 2 groups of events. This >>> is changed by: >>> https://lore.kernel.org/lkml/20230429053506.1962559-5-irogers@google.com/ >>> where the events are no longer grouped. >> The trick seems don't work on a hybrid machine. I still got the >> duplicate Topdown events on e-core. > For hybrid the rest of the patch series is necessary, ie the patches > beyond what's for 6.4, which I take from the output (ie not a crash) > you are looking at. As multiple groups are in play then it looks like > the atom events are on >1 PMU which can happen as the x86 code special > cases events with topdown in their name. Some fixes in the series for > this are: > https://lore.kernel.org/lkml/20230429053506.1962559-6-irogers@google.com/ > https://lore.kernel.org/lkml/20230429053506.1962559-40-irogers@google.com/ > and related: > https://lore.kernel.org/lkml/20230429053506.1962559-19-irogers@google.com/ > and so fixing this requires some detective work. > I applied all the patches of the series when I did the test on hybrid. The above patches don't help. > I don't think it should be a requirement for the series that all > hybrid bugs are fixed - especially given the complaints against the > length of the series as-is. I agree especially considering the metrics have been broken on the hybrid for a while. I just want to do a complete test and understand what is going to fixed and what hasn't. Thanks, Kan
© 2016 - 2025 Red Hat, Inc.