From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F590C7EE2F for ; Wed, 7 Jun 2023 16:27:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230271AbjFGQ1Y (ORCPT ); Wed, 7 Jun 2023 12:27:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbjFGQ1V (ORCPT ); Wed, 7 Jun 2023 12:27:21 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94AD419BC; Wed, 7 Jun 2023 09:27:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155240; x=1717691240; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1t0YBmHwyxZmGA+01w7vD6GxXoMOf6lhCQFoyeHgFVQ=; b=ZuZFTBNUCd8gFy0XEHYnmviJS4to1O5I/zHuMC0LWuOYDgRJYYq+j1TI hjnxZZ57dWN4JN9B/wDk/U1Gn6IPKv+RFgtIkHEBsZ4gh+/wtWoSbRGUa g6uY0o90ntHdL8hdWeT99/7JYFXXqWz/GWZHpRQJFuqMNoXci+RH47HYY Fk74hojYm8dI3JiWCNSIcNDj0uSjaYjCP8RTMxMUyFPzmmTir0Pfv77cp lgIPXB2ZGfH7R+RR7K3+lYwnQFJTwv6ZQ6mGFcJHwSDZEhc6aG+BESguT V1MZW+PXtNRO3iHcJRumTyS8hY6r7P2Cu9PEj4TIBNhzB5X+5QtgaangV w==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892612" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892612" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697652" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697652" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:19 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 1/8] perf metric: Fix no group check Date: Wed, 7 Jun 2023 09:26:53 -0700 Message-Id: <20230607162700.3234712-2-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang The no group check fails if there is more than one meticgroup in the metricgroup_no_group. The first parameter of the match_metric() should be the string, while the substring should be the second parameter. Fixes: ccc66c609280 ("perf metric: JSON flag to not group events if gatheri= ng a metric group") Signed-off-by: Kan Liang Acked-by: Ian Rogers --- tools/perf/util/metricgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 70ef2e23a710..74f2d8efc02d 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1175,7 +1175,7 @@ static int metricgroup__add_metric_callback(const str= uct pmu_metric *pm, =20 if (pm->metric_expr && match_pm_metric(pm, data->pmu, data->metric_name))= { bool metric_no_group =3D data->metric_no_group || - match_metric(data->metric_name, pm->metricgroup_no_group); + match_metric(pm->metricgroup_no_group, data->metric_name); =20 data->has_match =3D true; ret =3D add_metric(data->list, pm, data->modifier, metric_no_group, --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16613C77B7A for ; Wed, 7 Jun 2023 16:27:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231363AbjFGQ13 (ORCPT ); Wed, 7 Jun 2023 12:27:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230299AbjFGQ1Y (ORCPT ); Wed, 7 Jun 2023 12:27:24 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 823471702; Wed, 7 Jun 2023 09:27:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155242; x=1717691242; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PC2lmdyM/DcEWmb3eHpqN26YItN5+UuqPehPfmepDRw=; b=dEcxRc5eB1qjuN8jEq/Xud/bmb6QwFEGF13h8ykKE/OFX24QupyB54Dj k8QkzmrfMJbpd9lSWJIIRWHziwzvDysry+NHsx0+pR3Axrd4DWz+GMDvs HU7pBnsRJXcCgsgQOnEq/+Cp5jo3E7C49TFdGdPMOwIf1lY2ZBHUbgdd6 ifrsN1GJH1kxIEspWCzdoPi+ObJg8/io72v363SFB6XQavmiNizwWcodP SF5kGw0bKpAWqqtQhTjiOet2D+wJ3WFkCS18EaGhFLRnCEKZFYWlQlBlT Hfq2mI27n0xDo0jSBTq5IMQNJFzo68J67uXISgWrLYUmVzekGTjinFuA2 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892624" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892624" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697657" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697657" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:19 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 2/8] perf evsel: Fix the annotation for hardware events on hybrid Date: Wed, 7 Jun 2023 09:26:54 -0700 Message-Id: <20230607162700.3234712-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang The annotation for hardware events is wrong on hybrid. For example, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,148.85 msec cpu-clock # 32.000 CPUs ut= ilized 374 context-switches # 11.633 /sec 33 cpu-migrations # 1.026 /sec 295 page-faults # 9.176 /sec 18,979,960 cpu_core/cycles/ # 590.378 K/sec 261,230,783 cpu_atom/cycles/ # 8.126 M/sec = (54.21%) 17,019,732 cpu_core/instructions/ # 529.404 K/sec 38,020,470 cpu_atom/instructions/ # 1.183 M/sec = (63.36%) 3,296,743 cpu_core/branches/ # 102.546 K/sec 6,692,338 cpu_atom/branches/ # 208.167 K/sec = (63.40%) 96,421 cpu_core/branch-misses/ # 2.999 K/sec 1,016,336 cpu_atom/branch-misses/ # 31.613 K/sec = (63.38%) The hardware events have extended type on hybrid, but the evsel__match() doesn't take it into account. Add a mask to filter the extended type on hybrid when checking the config. With the patch, # ./perf stat -a sleep 1 Performance counter stats for 'system wide': 32,139.90 msec cpu-clock # 32.003 CPUs ut= ilized 343 context-switches # 10.672 /sec 32 cpu-migrations # 0.996 /sec 73 page-faults # 2.271 /sec 13,712,841 cpu_core/cycles/ # 0.000 GHz 258,301,691 cpu_atom/cycles/ # 0.008 GHz = (54.20%) 12,428,163 cpu_core/instructions/ # 0.91 insn pe= r cycle 37,786,557 cpu_atom/instructions/ # 2.76 insn pe= r cycle (63.35%) 2,418,826 cpu_core/branches/ # 75.259 K/sec 6,965,962 cpu_atom/branches/ # 216.739 K/sec = (63.38%) 72,150 cpu_core/branch-misses/ # 2.98% of all = branches 1,032,746 cpu_atom/branch-misses/ # 42.70% of all = branches (63.35%) Signed-off-by: Kan Liang --- tools/perf/util/evsel.h | 12 ++++++----- tools/perf/util/stat-shadow.c | 39 +++++++++++++++++++---------------- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index b365b449c6ea..36a32e4ca168 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -350,9 +350,11 @@ u64 format_field__intval(struct tep_format_field *fiel= d, struct perf_sample *sam =20 struct tep_format_field *evsel__field(struct evsel *evsel, const char *nam= e); =20 -#define evsel__match(evsel, t, c) \ +#define EVSEL_EVENT_MASK (~0ULL) + +#define evsel__match(evsel, t, c, m) \ (evsel->core.attr.type =3D=3D PERF_TYPE_##t && \ - evsel->core.attr.config =3D=3D PERF_COUNT_##c) + (evsel->core.attr.config & m) =3D=3D PERF_COUNT_##c) =20 static inline bool evsel__match2(struct evsel *e1, struct evsel *e2) { @@ -438,13 +440,13 @@ bool evsel__is_function_event(struct evsel *evsel); =20 static inline bool evsel__is_bpf_output(struct evsel *evsel) { - return evsel__match(evsel, SOFTWARE, SW_BPF_OUTPUT); + return evsel__match(evsel, SOFTWARE, SW_BPF_OUTPUT, EVSEL_EVENT_MASK); } =20 static inline bool evsel__is_clock(const struct evsel *evsel) { - return evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) || - evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK); + return evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK, EVSEL_EVENT_MASK) || + evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK, EVSEL_EVENT_MASK); } =20 bool evsel__fallback(struct evsel *evsel, int err, char *msg, size_t msgsi= ze); diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 1566a206ba42..074f38b57e2d 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -6,6 +6,7 @@ #include "color.h" #include "debug.h" #include "pmu.h" +#include "pmus.h" #include "rblist.h" #include "evlist.h" #include "expr.h" @@ -78,6 +79,8 @@ void perf_stat__reset_shadow_stats(void) =20 static enum stat_type evsel__stat_type(const struct evsel *evsel) { + u64 mask =3D perf_pmus__supports_extended_type() ? PERF_HW_EVENT_MASK : E= VSEL_EVENT_MASK; + /* Fake perf_hw_cache_op_id values for use with evsel__match. */ u64 PERF_COUNT_hw_cache_l1d_miss =3D PERF_COUNT_HW_CACHE_L1D | ((PERF_COUNT_HW_CACHE_OP_READ) << 8) | @@ -97,41 +100,41 @@ static enum stat_type evsel__stat_type(const struct ev= sel *evsel) =20 if (evsel__is_clock(evsel)) return STAT_NSECS; - else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES)) + else if (evsel__match(evsel, HARDWARE, HW_CPU_CYCLES, mask)) return STAT_CYCLES; - else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) + else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS, mask)) return STAT_INSTRUCTIONS; - else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) + else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_FRONTEND, mask)) return STAT_STALLED_CYCLES_FRONT; - else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND)) + else if (evsel__match(evsel, HARDWARE, HW_STALLED_CYCLES_BACKEND, mask)) return STAT_STALLED_CYCLES_BACK; - else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS)) + else if (evsel__match(evsel, HARDWARE, HW_BRANCH_INSTRUCTIONS, mask)) return STAT_BRANCHES; - else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) + else if (evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES, mask)) return STAT_BRANCH_MISS; - else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES)) + else if (evsel__match(evsel, HARDWARE, HW_CACHE_REFERENCES, mask)) return STAT_CACHE_REFS; - else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES)) + else if (evsel__match(evsel, HARDWARE, HW_CACHE_MISSES, mask)) return STAT_CACHE_MISSES; - else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D)) + else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1D, mask)) return STAT_L1_DCACHE; - else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I)) + else if (evsel__match(evsel, HW_CACHE, HW_CACHE_L1I, mask)) return STAT_L1_ICACHE; - else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL)) + else if (evsel__match(evsel, HW_CACHE, HW_CACHE_LL, mask)) return STAT_LL_CACHE; - else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB)) + else if (evsel__match(evsel, HW_CACHE, HW_CACHE_DTLB, mask)) return STAT_DTLB_CACHE; - else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB)) + else if (evsel__match(evsel, HW_CACHE, HW_CACHE_ITLB, mask)) return STAT_ITLB_CACHE; - else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss)) + else if (evsel__match(evsel, HW_CACHE, hw_cache_l1d_miss, mask)) return STAT_L1D_MISS; - else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss)) + else if (evsel__match(evsel, HW_CACHE, hw_cache_l1i_miss, mask)) return STAT_L1I_MISS; - else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss)) + else if (evsel__match(evsel, HW_CACHE, hw_cache_ll_miss, mask)) return STAT_LL_MISS; - else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss)) + else if (evsel__match(evsel, HW_CACHE, hw_cache_dtlb_miss, mask)) return STAT_DTLB_MISS; - else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss)) + else if (evsel__match(evsel, HW_CACHE, hw_cache_itlb_miss, mask)) return STAT_ITLB_MISS; return STAT_NONE; } --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EB3EC83003 for ; Wed, 7 Jun 2023 16:27:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231882AbjFGQ1j (ORCPT ); Wed, 7 Jun 2023 12:27:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231439AbjFGQ1d (ORCPT ); Wed, 7 Jun 2023 12:27:33 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECE6B11F; Wed, 7 Jun 2023 09:27:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155244; x=1717691244; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YSv2OOWGOAQZHH1tEzBJAE3oasGT2Li+d6S17eQPZxA=; b=P2tmBX1qToZ0V6E1+NkduhooscSb7tl7898kSkuwAyTtk5j6FT6vcz3N IaPuGmgMQ0nGOh85FcehzqYaBHeTDH9OHB97O9WiKK+LZSE6kFJN5Sw+7 4DM8yFb9T3r1CIafRAWQtDj9EbDlrybcLVrhX9BgLf8oNDfHmNL69dham tsqsD2lvkxpo5ub3dbqxKwfOx29sb76ROuUFfzQAI2eR2Lc0HZmW6C5tV I5Eocka3Rf9EAZShNM4earnDtLl5fHFD3n/n2bzjKt907O7SsppKq+s1q E4xozUqNGvIcl/EwQLokL2UalBbLUBsOSd26iDR72eS0NEdP5sxNy/TJc Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892629" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892629" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697661" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697661" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:20 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 3/8] perf metric: JSON flag to default metric group Date: Wed, 7 Jun 2023 09:26:55 -0700 Message-Id: <20230607162700.3234712-4-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang For the default output, the default metric group could vary on different platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics should be displayed in the default mode. On ICL, only the TopdownL1 should be displayed. Add a flag so we can tag the default metric group for different platforms rather than hack the perf code. The flag is added to Intel TopdownL1 since ICL and TopdownL2 metrics since SPR. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. Signed-off-by: Kan Liang --- .../arch/x86/alderlake/adl-metrics.json | 20 ++++--- .../arch/x86/icelake/icl-metrics.json | 20 ++++--- .../arch/x86/icelakex/icx-metrics.json | 20 ++++--- .../arch/x86/sapphirerapids/spr-metrics.json | 60 +++++++++++-------- .../arch/x86/tigerlake/tgl-metrics.json | 20 ++++--- 5 files changed, 84 insertions(+), 56 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index c9f7e3d4ab08..e78c85220e27 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -832,22 +832,24 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1112,11 +1114,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tm= a_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2316,11 +2319,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index 20210742171d..cc4edf855064 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -111,21 +111,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -372,11 +374,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1378,11 +1381,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index ef25cda019be..6f25b5b7aaf6 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -315,21 +315,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -576,11 +578,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1674,11 +1677,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index 4f3dd85540b6..c732982f70b5 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -340,31 +340,34 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_inf= o_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tm= a_info_thread_slots", - "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", + "MetricGroup": "BadSpec;BrMispredicts;Default;TmaL2;TopdownL2;tma_= L2_group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: = tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredict= ions, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -407,11 +410,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e Core non-memory issues were of a bottleneck", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)", - "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", + "MetricGroup": "Backend;Compute;Default;TmaL2;TopdownL2;tma_L2_gro= up;tma_backend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -509,21 +513,23 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend bandwidth issues", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)", - "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", + "MetricGroup": "Default;FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_gr= oup;tma_frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_thread_ipc / 6 > 0.35", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_= info_inst_mix_iptb, tma_lcp", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.U= OP_DROPPING / tma_info_thread_slots", - "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", + "MetricGroup": "Default;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -611,11 +617,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -630,11 +637,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEA= VY", "ScaleUnit": "100%" }, @@ -1486,11 +1494,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring light-weight operations -- instructions that require= no more than one uop (micro-operation)", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)", - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1540,11 +1549,12 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Machine Clears", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts= )", - "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", + "MetricGroup": "BadSpec;Default;MachineClears;TmaL2;TopdownL2;tma_= L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1576,11 +1586,12 @@ }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", - "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", + "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1784,11 +1795,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index d0538a754288..83346911aa63 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -105,21 +105,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -366,11 +368,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1392,11 +1395,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F7B0C7EE2F for ; Wed, 7 Jun 2023 16:27:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231420AbjFGQ1c (ORCPT ); Wed, 7 Jun 2023 12:27:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229683AbjFGQ1Z (ORCPT ); Wed, 7 Jun 2023 12:27:25 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35B331734; Wed, 7 Jun 2023 09:27:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155244; x=1717691244; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jMWpaJgiMGTiy13U7uDkL/ns2NCxNprT8Kgl9zFnTro=; b=DYz1J53+AyHB9ApNeafDaaml8h3VLsF/TDsPE7vYMR+1ZYAtOE8rn23f L/di7RNRzHYgpVMq083rmi0WXzAYEC43LRpOANkzbhoKBE3GhwXobwgcO zpHi0+FNhcsi4rswYyL2vWWFYwV70Vh2TW2uukXHrx3KwSigd9jJg7ze0 9+ulrwy6d5RVC6VyjcjPkk2G4mWUv/zfFJfMyzzIEcBqRm4u/X1lBtJXY 2V4lbVt+J8u0dPgYTu03gQ6xadsYNwZFbYSDRIUSp9IYiiAlex8V0h95u xXF5qZ9ei5iSiXHMumOnGegUEJDUpVWrPiCbs1XCM6oNSUPoWdL3Nk9ur g==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892642" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892642" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697668" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697668" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:21 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang , Jing Zhang , John Garry Subject: [PATCH 4/8] perf vendor events arm64: Add default tags into topdown L1 metrics Date: Wed, 7 Jun 2023 09:26:56 -0700 Message-Id: <20230607162700.3234712-5-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang Add the default tags for ARM as well. Signed-off-by: Kan Liang Cc: Jing Zhang Cc: John Garry Acked-by: Ian Rogers Reviewed-by: John Garry --- tools/perf/pmu-events/arch/arm64/sbsa.json | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/tools/perf/pmu-events/arch/arm64/sbsa.json b/tools/perf/pmu-ev= ents/arch/arm64/sbsa.json index f678c37ea9c3..f90b338261ac 100644 --- a/tools/perf/pmu-events/arch/arm64/sbsa.json +++ b/tools/perf/pmu-events/arch/arm64/sbsa.json @@ -2,28 +2,32 @@ { "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)", "BriefDescription": "Frontend bound L1 topdown metric", - "MetricGroup": "TopdownL1", + "DefaultMetricgroupName": "TopdownL1", + "MetricGroup": "Default;TopdownL1", "MetricName": "frontend_bound", "ScaleUnit": "100%" }, { "MetricExpr": "(1 - op_retired / op_spec) * (1 - stall_slot / (#sl= ots * cpu_cycles))", "BriefDescription": "Bad speculation L1 topdown metric", - "MetricGroup": "TopdownL1", + "DefaultMetricgroupName": "TopdownL1", + "MetricGroup": "Default;TopdownL1", "MetricName": "bad_speculation", "ScaleUnit": "100%" }, { "MetricExpr": "(op_retired / op_spec) * (1 - stall_slot / (#slots = * cpu_cycles))", "BriefDescription": "Retiring L1 topdown metric", - "MetricGroup": "TopdownL1", + "DefaultMetricgroupName": "TopdownL1", + "MetricGroup": "Default;TopdownL1", "MetricName": "retiring", "ScaleUnit": "100%" }, { "MetricExpr": "stall_slot_backend / (#slots * cpu_cycles)", "BriefDescription": "Backend Bound L1 topdown metric", - "MetricGroup": "TopdownL1", + "DefaultMetricgroupName": "TopdownL1", + "MetricGroup": "Default;TopdownL1", "MetricName": "backend_bound", "ScaleUnit": "100%" } --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F279C7EE2F for ; Wed, 7 Jun 2023 16:27:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231495AbjFGQ1f (ORCPT ); Wed, 7 Jun 2023 12:27:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231174AbjFGQ10 (ORCPT ); Wed, 7 Jun 2023 12:27:26 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA4AA19BB; Wed, 7 Jun 2023 09:27:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155245; x=1717691245; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MXKjqaQYibkfmmsTo97CLoA/cG1YHd3tkbVcyVzKT1s=; b=VpqSDY5eJtYcP3N0ZeizO5GrZjoTkOzz9+y1aW0W5kCz/FOMBlzw9QvN Z9iu/Dr+3ndVbY96CiW+tX+GGH3sia+DSi7jcwaEIB+YLk5BLNqsKXwpQ qx9KClNsv0leKHreyetN9jpodrZ9wfW0KPa4FCqUk5zzrvItFLz3A61zX t4B4lHxVbDp1gbP28eaCuRLcz3bixFQWsMjZSzKKDC676ix4L0JiSI0iQ jzSV+RmDluFbTngybmsrIWnvZ5NECefI9ZGtwzEekwh0tNKRomj/TRpFo gE/URneP59iDBSciOz4IjUxJpNBQdRSCMRG96Bk/OWeQGXJr7s+MqORit w==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892660" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892660" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697673" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697673" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:21 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 5/8] perf stat,jevents: Introduce Default tags for the default mode Date: Wed, 7 Jun 2023 09:26:57 -0700 Message-Id: <20230607162700.3234712-6-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. It will be printed in the default output to replace the event names. There is nothing changed for the output format. On SPR, both TopdownL1 and TopdownL2 are displayed in the default output. On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is displayed in the default output. Suggested-by: Stephane Eranian Signed-off-by: Kan Liang --- tools/perf/builtin-stat.c | 4 ++-- tools/perf/pmu-events/jevents.py | 5 +++-- tools/perf/pmu-events/pmu-events.h | 1 + tools/perf/util/metricgroup.c | 3 +++ 4 files changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index c87c6897edc9..2269b3e90e9b 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2154,14 +2154,14 @@ static int add_default_attributes(void) * Add TopdownL1 metrics if they exist. To minimize * multiplexing, don't request threshold computation. */ - if (metricgroup__has_metric(pmu, "TopdownL1")) { + if (metricgroup__has_metric(pmu, "Default")) { struct evlist *metric_evlist =3D evlist__new(); struct evsel *metric_evsel; =20 if (!metric_evlist) return -1; =20 - if (metricgroup__parse_groups(metric_evlist, pmu, "TopdownL1", + if (metricgroup__parse_groups(metric_evlist, pmu, "Default", /*metric_no_group=3D*/false, /*metric_no_merge=3D*/false, /*metric_no_threshold=3D*/true, diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jeven= ts.py index 7ed258be1829..12e80bb7939b 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -54,8 +54,8 @@ _json_event_attributes =3D [ # Attributes that are in pmu_metric rather than pmu_event. _json_metric_attributes =3D [ 'pmu', 'metric_name', 'metric_group', 'metric_expr', 'metric_threshold= ', - 'desc', 'long_desc', 'unit', 'compat', 'metricgroup_no_group', 'aggr_m= ode', - 'event_grouping' + 'desc', 'long_desc', 'unit', 'compat', 'metricgroup_no_group', + 'default_metricgroup_name', 'aggr_mode', 'event_grouping' ] # Attributes that are bools or enum int values, encoded as '0', '1',... _json_enum_attributes =3D ['aggr_mode', 'deprecated', 'event_grouping', 'p= erpkg'] @@ -307,6 +307,7 @@ class JsonEvent: self.metric_name =3D jd.get('MetricName') self.metric_group =3D jd.get('MetricGroup') self.metricgroup_no_group =3D jd.get('MetricgroupNoGroup') + self.default_metricgroup_name =3D jd.get('DefaultMetricgroupName') self.event_grouping =3D convert_metric_constraint(jd.get('MetricConstr= aint')) self.metric_expr =3D None if 'MetricExpr' in jd: diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu= -events.h index 8cd23d656a5d..caf59f23cd64 100644 --- a/tools/perf/pmu-events/pmu-events.h +++ b/tools/perf/pmu-events/pmu-events.h @@ -61,6 +61,7 @@ struct pmu_metric { const char *desc; const char *long_desc; const char *metricgroup_no_group; + const char *default_metricgroup_name; enum aggr_mode_class aggr_mode; enum metric_event_groups event_grouping; }; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 74f2d8efc02d..efafa02db5e5 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -137,6 +137,8 @@ struct metric { * output. */ const char *metric_unit; + /** Optional default metric group name */ + const char *default_metricgroup_name; /** Optional null terminated array of referenced metrics. */ struct metric_ref *metric_refs; /** @@ -219,6 +221,7 @@ static struct metric *metric__new(const struct pmu_metr= ic *pm, =20 m->pmu =3D pm->pmu ?: "cpu"; m->metric_name =3D pm->metric_name; + m->default_metricgroup_name =3D pm->default_metricgroup_name; m->modifier =3D NULL; if (modifier) { m->modifier =3D strdup(modifier); --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68E5DC77B7A for ; Wed, 7 Jun 2023 16:27:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231997AbjFGQ1s (ORCPT ); Wed, 7 Jun 2023 12:27:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231466AbjFGQ1d (ORCPT ); Wed, 7 Jun 2023 12:27:33 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9549D1BF9; Wed, 7 Jun 2023 09:27:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155245; x=1717691245; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2C7aCX6G2hhSoQAmNKGVuPpL/IuHX6KOy5Rdk4ABEOg=; b=cRph+muB+AaRpZho2OlaIW41cA1IYQg0nb1QzUL6I1Xa2bNfqI/QhVHZ DcF5OLy6JEclw5rzzI/75WctVh2q4JKaKKnrL2W/m5iD8+HGMr3hM27QA kBL5dHsIPTt/v2/FMj7hqQ/02IPU+19383XqEVfnTITVf/wTlt6rLHqka 1pt0+jWXiZToosqOMmyuPYCTwd3XBusXsFtti5M1MRUpUs6w+H2LoMjMt n6d5JeteQ3AtNhhHS9nRhcm/1BkP8ksJ3tC92JVHO3seQTQwnx4BK4MU9 qQevtMgZHMpb6vQPYdhfBcuwhuKE63J7X3EaDbUoEVfai+6XC8gc4RZE3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892666" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892666" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697678" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697678" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:22 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 6/8] perf stat,metrics: New metricgroup output for the default mode Date: Wed, 7 Jun 2023 09:26:58 -0700 Message-Id: <20230607162700.3234712-7-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output hard to read. Since different ARCHs (even different generations in the same ARCH) may use different events. The output also vary on different platforms. For a metricgroup, only outputting the value of each metric is good enough. Current perf may append different metric groups to the same leader event, or append the metrics from the same metricgroup to different events. That could bring confusion when perf only prints the metricgroup output mode. For example, print the same metricgroup name several times. Reorganize metricgroup for the default mode and make sure that a metricgroup can only be appended to one event. Sort the metricgroup for the default mode by the name of the metricgroup. Add a new field default_metricgroup in evsel to indicate an event of the default metricgroup. For those events, printout() should print the metricgroup name rather than events. Add print_metricgroup_header() to print out the metricgroup name in different output formats. On SPR Before: ./perf_old stat sleep 1 Performance counter stats for 'sleep 1': 0.54 msec task-clock:u # 0.001 CPUs ut= ilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 125.445 K/sec 540,970 cycles:u # 0.998 GHz 556,325 instructions:u # 1.03 insn pe= r cycle 123,602 branches:u # 228.018 M/sec 6,889 branch-misses:u # 5.57% of all = branches 3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_= backend_bound # 17.2 % tma_retiring # 23.1 % tma_bad_spe= culation # 41.4 % tma_fronten= d_bound 564,859 topdown-retiring:u 1,370,999 topdown-fe-bound:u 603,271 topdown-be-bound:u 744,874 topdown-bad-spec:u 12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec 1.001798215 seconds time elapsed 0.000193000 seconds user 0.001700000 seconds sys After: $ ./perf stat sleep 1 Performance counter stats for 'sleep 1': 0.51 msec task-clock:u # 0.001 CPUs ut= ilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 132.683 K/sec 545,228 cycles:u # 1.064 GHz 555,509 instructions:u # 1.02 insn pe= r cycle 123,574 branches:u # 241.120 M/sec 6,957 branch-misses:u # 5.63% of all = branches TopdownL1 # 17.5 % tma_backend= _bound # 22.6 % tma_bad_spe= culation # 42.7 % tma_fronten= d_bound # 17.1 % tma_retiring TopdownL2 # 21.8 % tma_branch_= mispredicts # 11.5 % tma_core_bo= und # 13.4 % tma_fetch_b= andwidth # 29.3 % tma_fetch_l= atency # 2.7 % tma_heavy_o= perations # 14.5 % tma_light_o= perations # 0.8 % tma_machine= _clears # 6.1 % tma_memory_= bound 1.001712086 seconds time elapsed 0.000151000 seconds user 0.001618000 seconds sys Signed-off-by: Kan Liang --- tools/perf/builtin-stat.c | 1 + tools/perf/util/evsel.h | 1 + tools/perf/util/metricgroup.c | 106 ++++++++++++++++++++++++++++++++- tools/perf/util/metricgroup.h | 1 + tools/perf/util/stat-display.c | 69 ++++++++++++++++++++- 5 files changed, 172 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 2269b3e90e9b..b274cc264d56 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2172,6 +2172,7 @@ static int add_default_attributes(void) =20 evlist__for_each_entry(metric_evlist, metric_evsel) { metric_evsel->skippable =3D true; + metric_evsel->default_metricgroup =3D true; } evlist__splice_list_tail(evsel_list, &metric_evlist->core.entries); evlist__delete(metric_evlist); diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index 36a32e4ca168..61b1385108f4 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -130,6 +130,7 @@ struct evsel { bool reset_group; bool errored; bool needs_auxtrace_mmap; + bool default_metricgroup; struct hashmap *per_pkg_mask; int err; struct { diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index efafa02db5e5..22181ce4f27f 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -79,6 +79,7 @@ static struct rb_node *metric_event_new(struct rblist *rb= list __maybe_unused, return NULL; memcpy(me, entry, sizeof(struct metric_event)); me->evsel =3D ((struct metric_event *)entry)->evsel; + me->default_metricgroup_name =3D NULL; INIT_LIST_HEAD(&me->head); return &me->nd; } @@ -1133,14 +1134,19 @@ static int metricgroup__add_metric_sys_event_iter(c= onst struct pmu_metric *pm, /** * metric_list_cmp - list_sort comparator that sorts metrics with more eve= nts to * the front. tool events are excluded from the count. + * For the default metrics, sort them by metricgroup nam= e. */ -static int metric_list_cmp(void *priv __maybe_unused, const struct list_he= ad *l, +static int metric_list_cmp(void *priv, const struct list_head *l, const struct list_head *r) { const struct metric *left =3D container_of(l, struct metric, nd); const struct metric *right =3D container_of(r, struct metric, nd); struct expr_id_data *data; int i, left_count, right_count; + bool is_default =3D *(bool *)priv; + + if (is_default && left->default_metricgroup_name && right->default_metric= group_name) + return strcmp(left->default_metricgroup_name, right->default_metricgroup= _name); =20 left_count =3D hashmap__size(left->pctx->ids); perf_tool_event__for_each_event(i) { @@ -1497,6 +1503,91 @@ static int parse_ids(bool metric_no_merge, struct pe= rf_pmu *fake_pmu, return ret; } =20 +static struct metric_event * +metricgroup__lookup_default_metricgroup(struct rblist *metric_events, + struct evsel *evsel, + struct metric *m) +{ + struct metric_event *me; + char *name; + int err; + + me =3D metricgroup__lookup(metric_events, evsel, true); + if (!me->default_metricgroup_name) { + if (m->pmu && strcmp(m->pmu, "cpu")) + err =3D asprintf(&name, "%s (%s)", m->default_metricgroup_name, m->pmu); + else + err =3D asprintf(&name, "%s", m->default_metricgroup_name); + if (err < 0) + return NULL; + me->default_metricgroup_name =3D name; + } + if (!strncmp(m->default_metricgroup_name, + me->default_metricgroup_name, + strlen(m->default_metricgroup_name))) + return me; + + return NULL; +} + +static struct metric_event * +metricgroup__lookup_create(struct rblist *metric_events, + struct evsel **evsel, + struct list_head *metric_list, + struct metric *m, + bool is_default) +{ + struct metric_event *me; + struct metric *cur; + struct evsel *ev; + size_t i; + + if (!is_default) + return metricgroup__lookup(metric_events, evsel[0], true); + + /* + * If the metric group has been attached to a previous + * event/metric, use that metric event. + */ + list_for_each_entry(cur, metric_list, nd) { + if (cur =3D=3D m) + break; + if (cur->pmu && strcmp(m->pmu, cur->pmu)) + continue; + if (strncmp(m->default_metricgroup_name, + cur->default_metricgroup_name, + strlen(m->default_metricgroup_name))) + continue; + if (!cur->evlist) + continue; + evlist__for_each_entry(cur->evlist, ev) { + me =3D metricgroup__lookup(metric_events, ev, false); + if (!strncmp(m->default_metricgroup_name, + me->default_metricgroup_name, + strlen(m->default_metricgroup_name))) + return me; + } + } + + /* + * Different metric groups may append to the same leader event. + * For example, TopdownL1 and TopdownL2 are appended to the + * TOPDOWN.SLOTS event. + * Split it and append the new metric group to the next available + * event. + */ + me =3D metricgroup__lookup_default_metricgroup(metric_events, evsel[0], m= ); + if (me) + return me; + + for (i =3D 1; i < hashmap__size(m->pctx->ids); i++) { + me =3D metricgroup__lookup_default_metricgroup(metric_events, evsel[i], = m); + if (me) + return me; + } + return NULL; +} + static int parse_groups(struct evlist *perf_evlist, const char *pmu, const char *str, bool metric_no_group, @@ -1512,6 +1603,7 @@ static int parse_groups(struct evlist *perf_evlist, LIST_HEAD(metric_list); struct metric *m; bool tool_events[PERF_TOOL_MAX] =3D {false}; + bool is_default =3D !strcmp(str, "Default"); int ret; =20 if (metric_events_list->nr_entries =3D=3D 0) @@ -1523,7 +1615,7 @@ static int parse_groups(struct evlist *perf_evlist, goto out; =20 /* Sort metrics from largest to smallest. */ - list_sort(NULL, &metric_list, metric_list_cmp); + list_sort((void *)&is_default, &metric_list, metric_list_cmp); =20 if (!metric_no_merge) { struct expr_parse_ctx *combined =3D NULL; @@ -1603,7 +1695,15 @@ static int parse_groups(struct evlist *perf_evlist, goto out; } =20 - me =3D metricgroup__lookup(metric_events_list, metric_events[0], true); + me =3D metricgroup__lookup_create(metric_events_list, + metric_events, + &metric_list, m, + is_default); + if (!me) { + pr_err("Cannot create metric group for default!\n"); + ret =3D -EINVAL; + goto out; + } =20 expr =3D malloc(sizeof(struct metric_expr)); if (!expr) { diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h index bf18274c15df..e3609b853213 100644 --- a/tools/perf/util/metricgroup.h +++ b/tools/perf/util/metricgroup.h @@ -22,6 +22,7 @@ struct cgroup; struct metric_event { struct rb_node nd; struct evsel *evsel; + char *default_metricgroup_name; struct list_head head; /* list of metric_expr */ }; =20 diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index a2bbdc25d979..efe5fd04c033 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -21,10 +21,12 @@ #include "iostat.h" #include "pmu.h" #include "pmus.h" +#include "metricgroup.h" =20 #define CNTR_NOT_SUPPORTED "" #define CNTR_NOT_COUNTED "" =20 +#define MGROUP_LEN 50 #define METRIC_LEN 38 #define EVNAME_LEN 32 #define COUNTS_LEN 18 @@ -707,6 +709,55 @@ static bool evlist__has_hybrid(struct evlist *evlist) return false; } =20 +static void print_metricgroup_header_json(struct perf_stat_config *config, + struct outstate *os __maybe_unused, + const char *metricgroup_name) +{ + fprintf(config->output, "\"metricgroup\" : \"%s\"}", metricgroup_name); + new_line_json(config, (void *)os); +} + +static void print_metricgroup_header_csv(struct perf_stat_config *config, + struct outstate *os, + const char *metricgroup_name) +{ + int i; + + for (i =3D 0; i < os->nfields; i++) + fputs(config->csv_sep, os->fh); + fprintf(config->output, "%s", metricgroup_name); + new_line_csv(config, (void *)os); +} + +static void print_metricgroup_header_std(struct perf_stat_config *config, + struct outstate *os __maybe_unused, + const char *metricgroup_name) +{ + int n =3D fprintf(config->output, " %*s", EVNAME_LEN, metricgroup_name); + + fprintf(config->output, "%*s", MGROUP_LEN - n - 1, ""); +} + +static void print_metricgroup_header(struct perf_stat_config *config, + struct outstate *os, + struct evsel *counter, + double noise, u64 run, u64 ena, + const char *metricgroup_name) +{ + aggr_printout(config, os->evsel, os->id, os->aggr_nr); + + print_noise(config, counter, noise, /*before_metric=3D*/true); + print_running(config, run, ena, /*before_metric=3D*/true); + + if (config->json_output) { + print_metricgroup_header_json(config, os, metricgroup_name); + } else if (config->csv_output) { + print_metricgroup_header_csv(config, os, metricgroup_name); + } else + print_metricgroup_header_std(config, os, metricgroup_name); + +} + static void printout(struct perf_stat_config *config, struct outstate *os, double uval, u64 run, u64 ena, double noise, int aggr_idx) { @@ -751,10 +802,17 @@ static void printout(struct perf_stat_config *config,= struct outstate *os, out.force_header =3D false; =20 if (!config->metric_only) { - abs_printout(config, os->id, os->aggr_nr, counter, uval, ok); + if (counter->default_metricgroup) { + struct metric_event *me; =20 - print_noise(config, counter, noise, /*before_metric=3D*/true); - print_running(config, run, ena, /*before_metric=3D*/true); + me =3D metricgroup__lookup(&config->metric_events, counter, false); + print_metricgroup_header(config, os, counter, noise, run, ena, + me->default_metricgroup_name); + } else { + abs_printout(config, os->id, os->aggr_nr, counter, uval, ok); + print_noise(config, counter, noise, /*before_metric=3D*/true); + print_running(config, run, ena, /*before_metric=3D*/true); + } } =20 if (ok) { @@ -883,6 +941,11 @@ static void print_counter_aggrdata(struct perf_stat_co= nfig *config, if (counter->merged_stat) return; =20 + /* Only print the metric group for the default mode */ + if (counter->default_metricgroup && + !metricgroup__lookup(&config->metric_events, counter, false)) + return; + uniquify_counter(config, counter); =20 val =3D aggr->counts.val; --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1DFEC77B7A for ; Wed, 7 Jun 2023 16:27:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231945AbjFGQ1n (ORCPT ); Wed, 7 Jun 2023 12:27:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231438AbjFGQ1d (ORCPT ); Wed, 7 Jun 2023 12:27:33 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4554D1BFF; Wed, 7 Jun 2023 09:27:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155246; x=1717691246; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mwsZmUxJwi+9RiW1rvq0psUPvV+FPhuABjckNf+s/B4=; b=ADxPRujgbN8HWVJtM/3mZcougAbby7OFQB2HNGUtmhGrHcLU40qBuVmh ZRXVNMnZDXy4prqT084+UHZk+9LW6jEkdrYfd2lGR2z4sNHlcDBuPqkp/ xQSLARp4ubgIIuDAd+cz+PPdSUzjRhcOjQ96rcEaWpB3Y8TOzhE7P+GQg iTRWPlmXWwjZOTWh190guf9bVhsL1LtA/cuYaGM63FWEw53K4bN7vn5O2 maOd7m1g9w4dTtWNYgEFdR/Cvw6ZW/w9O7S2gs9a14l9W+U517AxhQXdU PkssI/4OzASFiaDQR4f7Fa6tW+3Fogk+EQKWawt3YTsoU3JyQne7c0Vgo A==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892673" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892673" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697683" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697683" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:23 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 7/8] pert tests: Support metricgroup perf stat JSON output Date: Wed, 7 Jun 2023 09:26:59 -0700 Message-Id: <20230607162700.3234712-8-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang A new field metricgroup has been added in the perf stat JSON output. Support it in the test case. Signed-off-by: Kan Liang Acked-by: Ian Rogers --- tools/perf/tests/shell/lib/perf_json_output_lint.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/tests/shell/lib/perf_json_output_lint.py b/tools/pe= rf/tests/shell/lib/perf_json_output_lint.py index b81582a89d36..5e9bd68c83fe 100644 --- a/tools/perf/tests/shell/lib/perf_json_output_lint.py +++ b/tools/perf/tests/shell/lib/perf_json_output_lint.py @@ -55,6 +55,7 @@ def check_json_output(expected_items): 'interval': lambda x: isfloat(x), 'metric-unit': lambda x: True, 'metric-value': lambda x: isfloat(x), + 'metricgroup': lambda x: True, 'node': lambda x: True, 'pcnt-running': lambda x: isfloat(x), 'socket': lambda x: True, @@ -70,6 +71,8 @@ def check_json_output(expected_items): # values and possibly other prefixes like interval, core and # aggregate-number. pass + elif count !=3D expected_items and count >=3D 1 and count <=3D 5 and= 'metricgroup' in item: + pass elif count !=3D expected_items: raise RuntimeError(f'wrong number of fields. counted {count} expec= ted {expected_items}' f' in \'{item}\'') --=20 2.35.1 From nobody Sat Feb 7 21:48:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99339C7EE23 for ; Wed, 7 Jun 2023 16:27:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232040AbjFGQ1w (ORCPT ); Wed, 7 Jun 2023 12:27:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231556AbjFGQ1d (ORCPT ); Wed, 7 Jun 2023 12:27:33 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4A4C1FC8; Wed, 7 Jun 2023 09:27:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686155247; x=1717691247; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nDFGuYU/fWQ64S00Ylbe8Fwu73ikBHRCTGZrVvEniZI=; b=gvMwdtIg3jCMCPyPhV8V+fn73j4S8Mpjjsz0BfHSvZWT8ERMaiu2Vr9D NrA+IWAlCoIYAnkg1Lp72+TIfENnNPyeSLsx9JpQnRD/igQ0E+8TLKZFj FoyfXU1Se2G0x/RkEhWq6po2gI/OAqqdpx7pHURmzJ+Fz1QDAr9nQj2OH WcMkJVv7/D0rvDNocE9Hwb80aIqjNhJKdkqGc1TGYhvG3wSc+lymTswRQ ZMKp6pSn7sZSm2hAmxLFwXYAatUU8SFV9EPoMX8+zOcRM9lgP8TiRw7/F bf8UhI0Kyyq+83CNAtORkCc1i00iSPRn3UCrdD+fKJkvhmDfo9LjJhRr0 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="355892685" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="355892685" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 09:27:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10734"; a="774697688" X-IronPort-AV: E=Sophos;i="6.00,224,1681196400"; d="scan'208";a="774697688" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga008.fm.intel.com with ESMTP; 07 Jun 2023 09:27:23 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH 8/8] perf test: Add test case for the standard perf stat output Date: Wed, 7 Jun 2023 09:27:00 -0700 Message-Id: <20230607162700.3234712-9-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230607162700.3234712-1-kan.liang@linux.intel.com> References: <20230607162700.3234712-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang Add a new test case to verify the standard perf stat output with different options. Signed-off-by: Kan Liang --- tools/perf/tests/shell/stat+std_output.sh | 259 ++++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100755 tools/perf/tests/shell/stat+std_output.sh diff --git a/tools/perf/tests/shell/stat+std_output.sh b/tools/perf/tests/s= hell/stat+std_output.sh new file mode 100755 index 000000000000..b9db0f245450 --- /dev/null +++ b/tools/perf/tests/shell/stat+std_output.sh @@ -0,0 +1,259 @@ +#!/bin/bash +# perf stat STD output linter +# SPDX-License-Identifier: GPL-2.0 +# Tests various perf stat STD output commands for +# default event and metricgroup + +set -e + +skip_test=3D0 + +stat_output=3D$(mktemp /tmp/__perf_test.stat_output.std.XXXXX) + +event_name=3D(cpu-clock task-clock context-switches cpu-migrations page-fa= ults cycles instructions branches branch-misses stalled-cycles-frontend sta= lled-cycles-backend) +event_metric=3D("CPUs utilized" "CPUs utilized" "/sec" "/sec" "/sec" "GHz"= "insn per cycle" "/sec" "of all branches" "frontend cycles idle" "backend = cycles idle") + +metricgroup_name=3D(TopdownL1 TopdownL2) + +cleanup() { + rm -f "${stat_output}" + + trap - EXIT TERM INT +} + +trap_cleanup() { + cleanup + exit 1 +} +trap trap_cleanup EXIT TERM INT + +function commachecker() +{ + local -i cnt=3D0 + local prefix=3D1 + + case "$1" + in "--interval") prefix=3D2 + ;; "--per-thread") prefix=3D2 + ;; "--system-wide-no-aggr") prefix=3D2 + ;; "--per-core") prefix=3D3 + ;; "--per-socket") prefix=3D3 + ;; "--per-node") prefix=3D3 + ;; "--per-die") prefix=3D3 + ;; "--per-cache") prefix=3D3 + esac + + while read line + do + # Ignore initial "started on" comment. + x=3D${line:0:1} + [ "$x" =3D "#" ] && continue + # Ignore initial blank line. + [ "$line" =3D "" ] && continue + # Ignore "Performance counter stats" + x=3D${line:0:25} + [ "$x" =3D "Performance counter stats" ] && continue + # Ignore "seconds time elapsed" and break + [[ "$line" =3D=3D *"time elapsed"* ]] && break + + main_body=3D$(echo $line | cut -d' ' -f$prefix-) + x=3D${main_body%#*} + # Check default metricgroup + y=3D$(echo $x | tr -d ' ') + [ "$y" =3D "" ] && continue + for i in "${!metricgroup_name[@]}"; do + [[ "$y" =3D=3D *"${metricgroup_name[$i]}"* ]] && break + done + [[ "$y" =3D=3D *"${metricgroup_name[$i]}"* ]] && continue + + # Check default event + for i in "${!event_name[@]}"; do + [[ "$x" =3D=3D *"${event_name[$i]}"* ]] && break + done + + [[ ! "$x" =3D=3D *"${event_name[$i]}"* ]] && { + echo "Unknown event name in $line" 1>&2 + exit 1; + } + + # Check event metric if it exists + [[ ! "$main_body" =3D=3D *"#"* ]] && continue + [[ ! "$main_body" =3D=3D *"${event_metric[$i]}"* ]] && { + echo "wrong event metric. expected ${event_metric[$i]} in $line" 1>&2 + exit 1; + } + done < "${stat_output}" + return 0 +} + +# Return true if perf_event_paranoid is > $1 and not running as root. +function ParanoidAndNotRoot() +{ + [ $(id -u) !=3D 0 ] && [ $(cat /proc/sys/kernel/perf_event_paranoid) -gt= $1 ] +} + +check_no_args() +{ + echo -n "Checking STD output: no args " + perf stat -o "${stat_output}" true + commachecker --no-args + echo "[Success]" +} + +check_system_wide() +{ + echo -n "Checking STD output: system wide " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat -a -o "${stat_output}" true + commachecker --system-wide + echo "[Success]" +} + +check_system_wide_no_aggr() +{ + echo -n "Checking STD output: system wide no aggregation " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat -A -a --no-merge -o "${stat_output}" true + commachecker --system-wide-no-aggr + echo "[Success]" +} + +check_interval() +{ + echo -n "Checking STD output: interval " + perf stat -I 1000 -o "${stat_output}" true + commachecker --interval + echo "[Success]" +} + + +check_per_core() +{ + echo -n "Checking STD output: per core " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-core -a -o "${stat_output}" true + commachecker --per-core + echo "[Success]" +} + +check_per_thread() +{ + echo -n "Checking STD output: per thread " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-thread -a -o "${stat_output}" true + commachecker --per-thread + echo "[Success]" +} + +check_per_cache_instance() +{ + echo -n "Checking STD output: per cache instance " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-cache -a true 2>&1 | commachecker --per-cache + echo "[Success]" +} + +check_per_die() +{ + echo -n "Checking STD output: per die " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-die -a -o "${stat_output}" true + commachecker --per-die + echo "[Success]" +} + +check_per_node() +{ + echo -n "Checking STD output: per node " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-node -a -o "${stat_output}" true + commachecker --per-node + echo "[Success]" +} + +check_per_socket() +{ + echo -n "Checking STD output: per socket " + if ParanoidAndNotRoot 0 + then + echo "[Skip] paranoid and not root" + return + fi + perf stat --per-socket -a -o "${stat_output}" true + commachecker --per-socket + echo "[Success]" +} + +# The perf stat options for per-socket, per-core, per-die +# and -A ( no_aggr mode ) uses the info fetched from this +# directory: "/sys/devices/system/cpu/cpu*/topology". For +# example, socket value is fetched from "physical_package_id" +# file in topology directory. +# Reference: cpu__get_topology_int in util/cpumap.c +# If the platform doesn't expose topology information, values +# will be set to -1. For example, incase of pSeries platform +# of powerpc, value for "physical_package_id" is restricted +# and set to -1. Check here validates the socket-id read from +# topology file before proceeding further + +FILE_LOC=3D"/sys/devices/system/cpu/cpu*/topology/" +FILE_NAME=3D"physical_package_id" + +check_for_topology() +{ + if ! ParanoidAndNotRoot 0 + then + socket_file=3D`ls $FILE_LOC/$FILE_NAME | head -n 1` + [ -z $socket_file ] && return 0 + socket_id=3D`cat $socket_file` + [ $socket_id =3D=3D -1 ] && skip_test=3D1 + return 0 + fi +} + +check_for_topology +check_no_args +check_system_wide +check_interval +check_per_thread +check_per_node +if [ $skip_test -ne 1 ] +then + check_system_wide_no_aggr + check_per_core + check_per_cache_instance + check_per_die + check_per_socket +else + echo "[Skip] Skipping tests for system_wide_no_aggr, per_core, per_die an= d per_socket since socket id exposed via topology is invalid" +fi +cleanup +exit 0 --=20 2.35.1