From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 356C1EE4996 for ; Mon, 21 Aug 2023 08:36:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234041AbjHUIgg (ORCPT ); Mon, 21 Aug 2023 04:36:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234029AbjHUIgd (ORCPT ); Mon, 21 Aug 2023 04:36:33 -0400 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FC44B5; Mon, 21 Aug 2023 01:36:30 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8BB_1692606986; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8BB_1692606986) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:27 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 1/8] perf pmu: "Compat" supports matching multiple identifiers Date: Mon, 21 Aug 2023 16:36:10 +0800 Message-Id: <1692606977-92009-2-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The jevent "Compat" is used for uncore PMU alias or metric definitions. The same PMU driver has different PMU identifiers due to different hardware versions and types, but they may have some common PMU event. Since a Compat value can only match one identifier, when adding the same event alias to PMUs with different identifiers, each identifier needs to be defined once, which is not streamlined enough. So let "Compat" supports matching multiple identifiers for uncore PMU alias. For example, the Compat value {43401;436*} can match the PMU identifier "43401", that is, CMN600_r0p0, and the PMU identifier with the prefix "436", that is, all CMN650, where "*" is a wildcard. Tokens in Unit field are delimited by ';' with no spaces. Signed-off-by: Jing Zhang Reviewed-by: John Garry --- tools/perf/util/pmu.c | 33 +++++++++++++++++++++++++++++++-- tools/perf/util/pmu.h | 1 + 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index ad209c8..6402423 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -776,6 +776,35 @@ static bool pmu_uncore_alias_match(const char *pmu_nam= e, const char *name) return res; } =20 +bool pmu_uncore_identifier_match(const char *id, const char *compat) +{ + char *tmp =3D NULL, *tok, *str; + bool res; + int n; + + /* + * The strdup() call is necessary here because "compat" is a const str* + * type and cannot be used as an argument to strtok_r(). + */ + str =3D strdup(compat); + if (!str) + return false; + + tok =3D strtok_r(str, ";", &tmp); + for (; tok; tok =3D strtok_r(NULL, ";", &tmp)) { + n =3D strlen(tok); + if ((tok[n - 1] =3D=3D '*' && !strncmp(id, tok, n - 1)) || + !strcmp(id, tok)) { + res =3D true; + goto out; + } + } + res =3D false; +out: + free(str); + return res; +} + struct pmu_add_cpu_aliases_map_data { struct list_head *head; const char *name; @@ -847,8 +876,8 @@ static int pmu_add_sys_aliases_iter_fn(const struct pmu= _event *pe, if (!pe->compat || !pe->pmu) return 0; =20 - if (!strcmp(pmu->id, pe->compat) && - pmu_uncore_alias_match(pe->pmu, pmu->name)) { + if (pmu_uncore_alias_match(pe->pmu, pmu->name) && + pmu_uncore_identifier_match(pmu->id, pe->compat)) { __perf_pmu__new_alias(idata->head, -1, (char *)pe->name, (char *)pe->desc, diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index b9a02de..9d4385d 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -241,6 +241,7 @@ void pmu_add_cpu_aliases_table(struct list_head *head, = struct perf_pmu *pmu, char *perf_pmu__getcpuid(struct perf_pmu *pmu); const struct pmu_events_table *pmu_events_table__find(void); const struct pmu_metrics_table *pmu_metrics_table__find(void); +bool pmu_uncore_identifier_match(const char *id, const char *compat); void perf_pmu_free_alias(struct perf_pmu_alias *alias); =20 int perf_pmu__convert_scale(const char *scale, char **end, double *sval); --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93D51EE49A5 for ; Mon, 21 Aug 2023 08:36:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234049AbjHUIgi (ORCPT ); Mon, 21 Aug 2023 04:36:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234030AbjHUIgd (ORCPT ); Mon, 21 Aug 2023 04:36:33 -0400 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C9C0BD; Mon, 21 Aug 2023 01:36:31 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8Bh_1692606987; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8Bh_1692606987) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:28 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 2/8] perf metric: "Compat" supports matching multiple identifiers Date: Mon, 21 Aug 2023 16:36:11 +0800 Message-Id: <1692606977-92009-3-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The jevent "Compat" is used for uncore PMU alias or metric definitions. The same PMU driver has different PMU identifiers due to different hardware versions and types, but they may have some common PMU metric. Since a Compat value can only match one identifier, when adding the same metric to PMUs with different identifiers, each identifier needs to be defined once, which is not streamlined enough. So let "Compat" supports matching multiple identifiers for uncore PMU metric. Signed-off-by: Jing Zhang Reviewed-by: John Garry Reviewed-by: Ian Rogers --- tools/perf/util/metricgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 5e9c657..ff81bc5 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -477,7 +477,7 @@ static int metricgroup__sys_event_iter(const struct pmu= _metric *pm, =20 while ((pmu =3D perf_pmu__scan(pmu))) { =20 - if (!pmu->id || strcmp(pmu->id, pm->compat)) + if (!pmu->id || !pmu_uncore_identifier_match(pmu->id, pm->compat)) continue; =20 return d->fn(pm, table, d->data); --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EC3FEE4996 for ; Mon, 21 Aug 2023 08:36:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234105AbjHUIgu (ORCPT ); Mon, 21 Aug 2023 04:36:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234045AbjHUIgm (ORCPT ); Mon, 21 Aug 2023 04:36:42 -0400 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BBA3B5; Mon, 21 Aug 2023 01:36:33 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R371e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8CV_1692606988; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8CV_1692606988) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:29 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 3/8] perf vendor events: Supplement the omitted EventCode Date: Mon, 21 Aug 2023 16:36:12 +0800 Message-Id: <1692606977-92009-4-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" If there is an "event=3D0" in the event description, the EventCode can be omitted in the JSON file, and jevent.py will automatically fill in "event=3D0" during parsing. However, for some events where EventCode and ConfigCode are missing, it is not necessary to automatically fill in "event=3D0", such as the CMN event description which is typically "type=3Dxxx, eventid=3Dxxx". Therefore, before modifying jevent.py to prevent it from automatically adding "event=3D0" by default, it is necessary to fill in all omitted EventCodes first. Signed-off-by: Jing Zhang --- tools/perf/pmu-events/arch/x86/alderlake/pipeline.json | 9 +++++++++ tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/broadwell/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json | 4 ++++ .../perf/pmu-events/arch/x86/broadwellde/uncore-cache.json | 2 ++ .../arch/x86/broadwellde/uncore-interconnect.json | 1 + .../pmu-events/arch/x86/broadwellde/uncore-memory.json | 1 + .../perf/pmu-events/arch/x86/broadwellde/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json | 4 ++++ .../perf/pmu-events/arch/x86/broadwellx/uncore-cache.json | 2 ++ .../arch/x86/broadwellx/uncore-interconnect.json | 13 ++++++++++= +++ .../perf/pmu-events/arch/x86/broadwellx/uncore-memory.json | 2 ++ .../perf/pmu-events/arch/x86/broadwellx/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json | 4 ++++ .../pmu-events/arch/x86/cascadelakex/uncore-cache.json | 2 ++ .../arch/x86/cascadelakex/uncore-interconnect.json | 1 + tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json | 1 + .../pmu-events/arch/x86/cascadelakex/uncore-memory.json | 1 + .../pmu-events/arch/x86/cascadelakex/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json | 2 ++ tools/perf/pmu-events/arch/x86/goldmont/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/grandridge/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/haswell/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/haswellx/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json | 2 ++ .../pmu-events/arch/x86/haswellx/uncore-interconnect.json | 14 ++++++++++= ++++ tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json | 2 ++ tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/icelake/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/icelakex/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json | 1 + .../pmu-events/arch/x86/icelakex/uncore-interconnect.json | 1 + tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json | 1 + tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/ivytown/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json | 2 ++ .../pmu-events/arch/x86/ivytown/uncore-interconnect.json | 11 +++++++++++ tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json | 1 + tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/jaketown/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json | 2 ++ .../pmu-events/arch/x86/jaketown/uncore-interconnect.json | 12 ++++++++++= ++ tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json | 1 + tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json | 2 ++ .../perf/pmu-events/arch/x86/knightslanding/pipeline.json | 3 +++ .../pmu-events/arch/x86/knightslanding/uncore-cache.json | 1 + .../pmu-events/arch/x86/knightslanding/uncore-memory.json | 4 ++++ tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json | 8 ++++++++ tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json | 4 ++++ .../perf/pmu-events/arch/x86/sapphirerapids/pipeline.json | 5 +++++ tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/silvermont/pipeline.json | 3 +++ tools/perf/pmu-events/arch/x86/skylake/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/skylakex/pipeline.json | 4 ++++ tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json | 2 ++ .../pmu-events/arch/x86/skylakex/uncore-interconnect.json | 1 + tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json | 1 + tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json | 1 + tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json | 2 ++ .../perf/pmu-events/arch/x86/snowridgex/uncore-cache.json | 1 + .../arch/x86/snowridgex/uncore-interconnect.json | 1 + .../perf/pmu-events/arch/x86/snowridgex/uncore-memory.json | 1 + .../perf/pmu-events/arch/x86/snowridgex/uncore-power.json | 1 + tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json | 5 +++++ 68 files changed, 211 insertions(+) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/alderlake/pipeline.json index cb5b861..7054426 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/pipeline.json @@ -489,6 +489,7 @@ }, { "BriefDescription": "Counts the number of unhalted core clock cycl= es. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runni= ng the HLT instruction. The core frequency may change from time to time. Fo= r this reason this event may have a changing ratio with regards to time. Th= is event uses fixed counter 1.", "SampleAfterValue": "2000003", @@ -550,6 +551,7 @@ }, { "BriefDescription": "Counts the number of unhalted reference clock= cycles at TSC frequency. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. This event is not affected by core frequency ch= anges and increments at a fixed frequency that is also used for the Time St= amp Counter (TSC). This event uses fixed counter 2.", "SampleAfterValue": "2000003", @@ -558,6 +560,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately=20 so the overflow status bit will flip 'high (1)' and generate another PMI (= if enabled) after which the reset value gets clocked into the counter. Ther= efore, software will get the interrupt, read the overflow status bit '1 for= bit 34 while the counter value is less than MAX. Software should ignore th= is case.", "SampleAfterValue": "2000003", @@ -584,6 +587,7 @@ }, { "BriefDescription": "Counts the number of unhalted core clock cycl= es. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runn= ing the HLT instruction. The core frequency may change from time to time. F= or this reason this event may have a changing ratio with regards to time. = This event uses fixed counter 1.", "SampleAfterValue": "2000003", @@ -592,6 +596,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -743,6 +748,7 @@ }, { "BriefDescription": "Counts the total number of instructions retir= ed. (Fixed event)", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the total number of instructions that= retired. For instructions that consist of multiple uops, this event counts= the retirement of the last uop of the instruction. This event continues co= unting during hardware interrupts, traps, and inside interrupt handlers. Th= is event uses fixed counter 0.", @@ -752,6 +758,7 @@ }, { "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", @@ -796,6 +803,7 @@ }, { "BriefDescription": "Precise instruction retired with PEBS precise= -distribution", + "EventCode": "0x0", "EventName": "INST_RETIRED.PREC_DIST", "PEBS": "1", "PublicDescription": "A version of INST_RETIRED that allows for a = precise distribution of samples across instructions retired. It utilizes th= e Precise Distribution of Instructions Retired (PDIR++) feature to fix bias= in how retired instructions get sampled. Use on Fixed Counter 0.", @@ -1160,6 +1168,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json b/tool= s/perf/pmu-events/arch/x86/alderlaken/pipeline.json index fa53ff1..345d1c8 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/pipeline.json @@ -211,6 +211,7 @@ }, { "BriefDescription": "Counts the number of unhalted core clock cycl= es. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runni= ng the HLT instruction. The core frequency may change from time to time. Fo= r this reason this event may have a changing ratio with regards to time. Th= is event uses fixed counter 1.", "SampleAfterValue": "2000003", @@ -225,6 +226,7 @@ }, { "BriefDescription": "Counts the number of unhalted reference clock= cycles at TSC frequency. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. This event is not affected by core frequency ch= anges and increments at a fixed frequency that is also used for the Time St= amp Counter (TSC). This event uses fixed counter 2.", "SampleAfterValue": "2000003", @@ -240,6 +242,7 @@ }, { "BriefDescription": "Counts the number of unhalted core clock cycl= es. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runn= ing the HLT instruction. The core frequency may change from time to time. F= or this reason this event may have a changing ratio with regards to time. = This event uses fixed counter 1.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json b/tools= /perf/pmu-events/arch/x86/broadwell/pipeline.json index 9a902d2..b114d0d 100644 --- a/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/broadwell/pipeline.json @@ -336,6 +336,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK eve= nt. It is counted on a dedicated fixed counter, leaving the four (eight whe= n Hyperthreading is disabled) programmable counters available for other eve= nts. \nNote: On all current platforms this event stops counting during 'thr= ottling (TM)' states duty off periods the processor is 'halted'. This even= t is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appea r 'sticky'. After the counter has overflowed and software clears the over= flow status bit and resets the counter to less than MAX. The reset value to= the counter is not clocked immediately so the overflow status bit will fli= p 'high (1)' and generate another PMI (if enabled) after which the reset va= lue gets clocked into the counter. Therefore, software will get the interru= pt, read the overflow status bit '1 for bit 34 while the counter value is l= ess than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -359,6 +360,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter, leaving the four (eight when Hyperthreading is disabled) programmable= counters available for other events.", "SampleAfterValue": "2000003", @@ -366,6 +368,7 @@ }, { "AnyThread": "1", + "EventCode": "0x0", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", @@ -514,6 +517,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed coun= ter, leaving the four (eight when Hyperthreading is disabled) programmable = counters available for other events. INST_RETIRED.ANY_P is counted by a pro= grammable counter and it is an architectural performance event. \nCounting:= Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as ret= ired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json b/too= ls/perf/pmu-events/arch/x86/broadwellde/pipeline.json index 9a902d2..ce90d058 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/pipeline.json @@ -336,6 +336,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK eve= nt. It is counted on a dedicated fixed counter, leaving the four (eight whe= n Hyperthreading is disabled) programmable counters available for other eve= nts. \nNote: On all current platforms this event stops counting during 'thr= ottling (TM)' states duty off periods the processor is 'halted'. This even= t is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appea r 'sticky'. After the counter has overflowed and software clears the over= flow status bit and resets the counter to less than MAX. The reset value to= the counter is not clocked immediately so the overflow status bit will fli= p 'high (1)' and generate another PMI (if enabled) after which the reset va= lue gets clocked into the counter. Therefore, software will get the interru= pt, read the overflow status bit '1 for bit 34 while the counter value is l= ess than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -359,6 +360,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter, leaving the four (eight when Hyperthreading is disabled) programmable= counters available for other events.", "SampleAfterValue": "2000003", @@ -367,6 +369,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -514,6 +517,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed coun= ter, leaving the four (eight when Hyperthreading is disabled) programmable = counters available for other events. INST_RETIRED.ANY_P is counted by a pro= grammable counter and it is an architectural performance event. \nCounting:= Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as ret= ired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json b= /tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json index 56bba6d..117be19 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-cache.json @@ -8,6 +8,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CBOX" @@ -1501,6 +1502,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_H_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts the number of uclks in the HA. This = will be slightly different than the count in the Ubox because of enable/fre= eze delays. The HA is on the other side of the die from the fixed Ubox ucl= k counter, so the drift could be somewhat larger than in units that are clo= ser like the QPI Agent.", diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect= .json b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json index 8a327e0..ce54bd3 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-interconnect.json @@ -19,6 +19,7 @@ }, { "BriefDescription": "Clocks in the IRP", + "EventCode": "0x0", "EventName": "UNC_I_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Number of clocks in the IRP.", diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json = b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json index a764234..32c46bd 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-memory.json @@ -131,6 +131,7 @@ }, { "BriefDescription": "DRAM Clockticks", + "EventCode": "0x0", "EventName": "UNC_M_DCLOCKTICKS", "PerPkg": "1", "Unit": "iMC" diff --git a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json b= /tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json index 83d2013..f57eb8e 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/broadwellde/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 1 GHz clock. This = event counts the number of pclk cycles measured while the counter was enabl= ed. The pclk, like the Memory Controller's dclk, counts at a constant rate= making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json b/tool= s/perf/pmu-events/arch/x86/broadwellx/pipeline.json index 9a902d2..ce90d058 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/pipeline.json @@ -336,6 +336,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK eve= nt. It is counted on a dedicated fixed counter, leaving the four (eight whe= n Hyperthreading is disabled) programmable counters available for other eve= nts. \nNote: On all current platforms this event stops counting during 'thr= ottling (TM)' states duty off periods the processor is 'halted'. This even= t is clocked by base clock (100 Mhz) on Sandy Bridge. The counter update is= done at a lower clock rate then the core clock the overflow status bit for= this counter may appea r 'sticky'. After the counter has overflowed and software clears the over= flow status bit and resets the counter to less than MAX. The reset value to= the counter is not clocked immediately so the overflow status bit will fli= p 'high (1)' and generate another PMI (if enabled) after which the reset va= lue gets clocked into the counter. Therefore, software will get the interru= pt, read the overflow status bit '1 for bit 34 while the counter value is l= ess than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -359,6 +360,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter, leaving the four (eight when Hyperthreading is disabled) programmable= counters available for other events.", "SampleAfterValue": "2000003", @@ -367,6 +369,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -514,6 +517,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers. \nNotes: INST_RETIRED.ANY is counted by a designated fixed coun= ter, leaving the four (eight when Hyperthreading is disabled) programmable = counters available for other events. INST_RETIRED.ANY_P is counted by a pro= grammable counter and it is an architectural performance event. \nCounting:= Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as ret= ired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json b/= tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json index 400d784..346f5cf 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-cache.json @@ -183,6 +183,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CBOX" @@ -1689,6 +1690,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_H_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts the number of uclks in the HA. This = will be slightly different than the count in the Ubox because of enable/fre= eze delays. The HA is on the other side of the die from the fixed Ubox ucl= k counter, so the drift could be somewhat larger than in units that are clo= ser like the QPI Agent.", diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json index e61a23f..df96e41 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-interconnect.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Number of non data (control) flits transmitte= d . Derived from unc_q_txl_flits_g0.non_data", + "EventCode": "0x0", "EventName": "QPI_CTL_BANDWIDTH_TX", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transm= itted across QPI. This basically tracks the protocol overhead on the QPI l= ink. One can get a good picture of the QPI-link characteristics by evaluat= ing the protocol flits, data flits, and idle/null flits. This includes the= header flits for data packets.", @@ -10,6 +11,7 @@ }, { "BriefDescription": "Number of data flits transmitted . Derived fr= om unc_q_txl_flits_g0.data", + "EventCode": "0x0", "EventName": "QPI_DATA_BANDWIDTH_TX", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QP= I. Each flit contains 64b of data. This includes both DRS and NCB data fl= its (coherent and non-coherent). This can be used to calculate the data ba= ndwidth of the QPI link. One can get a good picture of the QPI-link charac= teristics by evaluating the protocol flits, data flits, and idle/null flits= . This does not include the header flits that go in data packets.", @@ -37,6 +39,7 @@ }, { "BriefDescription": "Clocks in the IRP", + "EventCode": "0x0", "EventName": "UNC_I_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Number of clocks in the IRP.", @@ -1400,6 +1403,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QP= I. Each flit contains 64b of data. This includes both DRS and NCB data fl= its (coherent and non-coherent). This can be used to calculate the data ba= ndwidth of the QPI link. One can get a good picture of the QPI-link charac= teristics by evaluating the protocol flits, data flits, and idle/null flits= . This does not include the header flits that go in data packets.", @@ -1408,6 +1412,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Non-Data protoco= l Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transm= itted across QPI. This basically tracks the protocol overhead on the QPI l= ink. One can get a good picture of the QPI-link characteristics by evaluat= ing the protocol flits, data flits, and idle/null flits. This includes the= header flits for data packets.", @@ -1416,6 +1421,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both = Header and Data)", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of flits transmitted over QPI on the DRS (Data Response) channel. = DRS flits are used to transmit data with coherency.", @@ -1424,6 +1430,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of data flits transmitted over QPI on the DRS (Data Response) chan= nel. DRS flits are used to transmit data with coherency. This does not co= unt data flits transmitted over the NCB channel which transmits non-coheren= t data. This includes only the data flits (not the header).", @@ -1432,6 +1439,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits= ", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of protocol flits transmitted over QPI on the DRS (Data Response) = channel. DRS flits are used to transmit data with coherency. This does no= t count data flits transmitted over the NCB channel which transmits non-coh= erent data. This includes only the header flits (not the data). This incl= udes extended headers.", @@ -1440,6 +1448,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of flits transmitted over QPI on the home channel.", @@ -1448,6 +1457,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request = Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of non-request flits transmitted over QPI on the home channel. These ar= e most commonly snoop responses, and this event can be used as a proxy for = that.", @@ -1456,6 +1466,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Request Flit= s", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of data request transmitted over QPI on the home channel. This basicall= y counts the number of remote memory requests transmitted over QPI. In con= junction with the local read count in the Home Agent, one can calculate the= number of LLC Misses.", @@ -1464,6 +1475,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; SNP Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.SNP", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of snoop request flits transmitted over QPI. These requests are contain= ed in the snoop channel. This does not include snoop responses, which are = transmitted on the home channel.", @@ -3162,6 +3174,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_S_CLOCKTICKS", "PerPkg": "1", "Unit": "SBOX" diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json b= /tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json index b5a33e7a..0c5888d 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-memory.json @@ -158,12 +158,14 @@ }, { "BriefDescription": "Clockticks in the Memory Controller using one= of the programmable counters", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS_P", "PerPkg": "1", "Unit": "iMC" }, { "BriefDescription": "This event is deprecated. Refer to new event = UNC_M_CLOCKTICKS_P", + "EventCode": "0x0", "EventName": "UNC_M_DCLOCKTICKS", "PerPkg": "1", "Unit": "iMC" diff --git a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json b/= tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json index 83d2013..f57eb8e 100644 --- a/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/broadwellx/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 1 GHz clock. This = event counts the number of pclk cycles measured while the counter was enabl= ed. The pclk, like the Memory Controller's dclk, counts at a constant rate= making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json b/to= ols/perf/pmu-events/arch/x86/cascadelakex/pipeline.json index 0f06e31..99346e1 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json @@ -191,6 +191,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the four (eight when Hyperthre= ading is disabled) programmable counters available for other events. Note: = On all current platforms this event stops counting during 'throttling (TM)'= states duty off periods the processor is 'halted'. The counter update is = done at a lower clock rate then the core clock the overflow status bit for = this counter may appear 'sticky'. After the counter has overflowed and sof= tware clears the overfl ow status bit and resets the counter to less than MAX. The reset value to = the counter is not clocked immediately so the overflow status bit will flip= 'high (1)' and generate another PMI (if enabled) after which the reset val= ue gets clocked into the counter. Therefore, software will get the interrup= t, read the overflow status bit '1 for bit 34 while the counter value is le= ss than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -222,6 +223,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the four (eight when Hyperthreading is disabled) programmable counters a= vailable for other events.", "SampleAfterValue": "2000003", @@ -230,6 +232,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -369,6 +372,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "Counts the number of instructions retired fr= om execution. For instructions that consist of multiple micro-ops, Counts t= he retirement of the last micro-op of the instruction. Counting continues d= uring hardware interrupts, traps, and inside interrupt handlers. Notes: INS= T_RETIRED.ANY is counted by a designated fixed counter, leaving the four (e= ight when Hyperthreading is disabled) programmable counters available for o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter and it= is an architectural performance event. Counting: Faulting executions of GE= TSEC/VM entry/VM Exit/MWait will not count as retired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json = b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json index 2c88053..ba7a6f6 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-cache.json @@ -512,6 +512,7 @@ }, { "BriefDescription": "Uncore cache clock ticks", + "EventCode": "0x0", "EventName": "UNC_CHA_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts clockticks of the clock controlling t= he uncore caching and home agent (CHA).", @@ -5792,6 +5793,7 @@ }, { "BriefDescription": "This event is deprecated. Refer to new event = UNC_CHA_CLOCKTICKS", + "EventCode": "0x0", "Deprecated": "1", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnec= t.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.js= on index 725780f..43d7b24 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json @@ -1090,6 +1090,7 @@ }, { "BriefDescription": "Cycles - at UCLK", + "EventCode": "0x0", "EventName": "UNC_M2M_CLOCKTICKS", "PerPkg": "1", "Unit": "M2M" diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json b/t= ools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json index 743c91f..377d54f 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-io.json @@ -1271,6 +1271,7 @@ }, { "BriefDescription": "Counting disabled", + "EventCode": "0x0", "EventName": "UNC_IIO_NOTHING", "PerPkg": "1", "Unit": "IIO" diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json= b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json index f761856..77bb0ea 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json @@ -167,6 +167,7 @@ }, { "BriefDescription": "Memory controller clock ticks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts clockticks of the fixed frequency clo= ck of the memory controller using one of the programmable counters.", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json = b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json index c6254af..a01b279 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 1 GHz clock. This = event counts the number of pclk cycles measured while the counter was enabl= ed. The pclk, like the Memory Controller's dclk, counts at a constant rate= making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json b/too= ls/perf/pmu-events/arch/x86/elkhartlake/pipeline.json index 9dd8c90..3388cd5 100644 --- a/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/elkhartlake/pipeline.json @@ -150,6 +150,7 @@ }, { "BriefDescription": "Counts the number of unhalted reference clock= cycles at TSC frequency. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. This event is not affected by core frequency ch= anges and increments at a fixed frequency that is also used for the Time St= amp Counter (TSC). This event uses fixed counter 2.", "SampleAfterValue": "2000003", @@ -179,6 +180,7 @@ }, { "BriefDescription": "Counts the total number of instructions retir= ed. (Fixed event)", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the total number of instructions that= retired. For instructions that consist of multiple uops, this event counts= the retirement of the last uop of the instruction. This event continues co= unting during hardware interrupts, traps, and inside interrupt handlers. Th= is event uses fixed counter 0.", diff --git a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json b/tools/= perf/pmu-events/arch/x86/goldmont/pipeline.json index acb8974..79806e7 100644 --- a/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/goldmont/pipeline.json @@ -143,6 +143,7 @@ }, { "BriefDescription": "Core cycles when core is not halted (Fixed e= vent)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runn= ing the HLT instruction. In mobile systems the core frequency may change fr= om time to time. For this reason this event may have a changing ratio with = regards to time. This event uses fixed counter 1. You cannot collect a PE= Bs record for this event.", "SampleAfterValue": "2000003", @@ -165,6 +166,7 @@ }, { "BriefDescription": "Reference cycles when core is not halted (Fi= xed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. In mobile systems the core frequency may chang= e from time. This event is not affected by core frequency changes but coun= ts as if the core is running at the maximum frequency all the time. This e= vent uses fixed counter 2. You cannot collect a PEBs record for this event= .", "SampleAfterValue": "2000003", @@ -187,6 +189,7 @@ }, { "BriefDescription": "Instructions retired (Fixed event)", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "Counts the number of instructions that retir= e execution. For instructions that consist of multiple uops, this event cou= nts the retirement of the last uop of the instruction. The counter continue= s counting during hardware interrupts, traps, and inside interrupt handlers= . This event uses fixed counter 0. You cannot collect a PEBs record for t= his event.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json b/to= ols/perf/pmu-events/arch/x86/goldmontplus/pipeline.json index 33ef331..1be1b50 100644 --- a/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/goldmontplus/pipeline.json @@ -143,6 +143,7 @@ }, { "BriefDescription": "Core cycles when core is not halted (Fixed e= vent)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runn= ing the HLT instruction. In mobile systems the core frequency may change fr= om time to time. For this reason this event may have a changing ratio with = regards to time. This event uses fixed counter 1. You cannot collect a PE= Bs record for this event.", "SampleAfterValue": "2000003", @@ -165,6 +166,7 @@ }, { "BriefDescription": "Reference cycles when core is not halted (Fi= xed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. In mobile systems the core frequency may chang= e from time. This event is not affected by core frequency changes but coun= ts as if the core is running at the maximum frequency all the time. This e= vent uses fixed counter 2. You cannot collect a PEBs record for this event= .", "SampleAfterValue": "2000003", @@ -187,6 +189,7 @@ }, { "BriefDescription": "Instructions retired (Fixed event)", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "2", "PublicDescription": "Counts the number of instructions that retir= e execution. For instructions that consist of multiple uops, this event cou= nts the retirement of the last uop of the instruction. The counter continue= s counting during hardware interrupts, traps, and inside interrupt handlers= . This event uses fixed counter 0. You cannot collect a PEBs record for t= his event.", diff --git a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json b/tool= s/perf/pmu-events/arch/x86/grandridge/pipeline.json index 4121295..5335a7b 100644 --- a/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/grandridge/pipeline.json @@ -29,6 +29,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3" @@ -43,6 +44,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -55,6 +57,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json b/t= ools/perf/pmu-events/arch/x86/graniterapids/pipeline.json index 764c043..6ca34b9 100644 --- a/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/graniterapids/pipeline.json @@ -17,6 +17,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately=20 so the overflow status bit will flip 'high (1)' and generate another PMI (= if enabled) after which the reset value gets clocked into the counter. Ther= efore, software will get the interrupt, read the overflow status bit '1 for= bit 34 while the counter value is less than MAX. Software should ignore th= is case.", "SampleAfterValue": "2000003", @@ -32,6 +33,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -46,6 +48,7 @@ }, { "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", @@ -78,6 +81,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json b/tools/p= erf/pmu-events/arch/x86/haswell/pipeline.json index 540f437..0d5eafd 100644 --- a/tools/perf/pmu-events/arch/x86/haswell/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/haswell/pipeline.json @@ -303,6 +303,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate.", "SampleAfterValue": "2000003", @@ -327,6 +328,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of thread cycle= s while the thread is not in a halt state. The thread enters the halt state= when it is running the HLT instruction. The core frequency may change from= time to time due to power or thermal throttling.", "SampleAfterValue": "2000003", @@ -335,6 +337,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -436,6 +439,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "Errata": "HSD140, HSD143", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leav= ing the programmable counters available for other events. Faulting executio= ns of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.= ", diff --git a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json b/tools/= perf/pmu-events/arch/x86/haswellx/pipeline.json index 540f437..0d5eafd 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/pipeline.json @@ -303,6 +303,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate.", "SampleAfterValue": "2000003", @@ -327,6 +328,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of thread cycle= s while the thread is not in a halt state. The thread enters the halt state= when it is running the HLT instruction. The core frequency may change from= time to time due to power or thermal throttling.", "SampleAfterValue": "2000003", @@ -335,6 +337,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -436,6 +439,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "Errata": "HSD140, HSD143", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers. INST_RETIRED.ANY is counted by a designated fixed counter, leav= ing the programmable counters available for other events. Faulting executio= ns of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions.= ", diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json b/to= ols/perf/pmu-events/arch/x86/haswellx/uncore-cache.json index 9227cc2..64e2fb4 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-cache.json @@ -183,6 +183,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CBOX" @@ -1698,6 +1699,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_H_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts the number of uclks in the HA. This = will be slightly different than the count in the Ubox because of enable/fre= eze delays. The HA is on the other side of the die from the fixed Ubox ucl= k counter, so the drift could be somewhat larger than in units that are clo= ser like the QPI Agent.", diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json index 954e8198..7c4fc13 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-interconnect.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Number of non data (control) flits transmitte= d . Derived from unc_q_txl_flits_g0.non_data", + "EventCode": "0x0", "EventName": "QPI_CTL_BANDWIDTH_TX", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transm= itted across QPI. This basically tracks the protocol overhead on the QPI l= ink. One can get a good picture of the QPI-link characteristics by evaluat= ing the protocol flits, data flits, and idle/null flits. This includes the= header flits for data packets.", @@ -10,6 +11,7 @@ }, { "BriefDescription": "Number of data flits transmitted . Derived fr= om unc_q_txl_flits_g0.data", + "EventCode": "0x0", "EventName": "QPI_DATA_BANDWIDTH_TX", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QP= I. Each flit contains 64b of data. This includes both DRS and NCB data fl= its (coherent and non-coherent). This can be used to calculate the data ba= ndwidth of the QPI link. One can get a good picture of the QPI-link charac= teristics by evaluating the protocol flits, data flits, and idle/null flits= . This does not include the header flits that go in data packets.", @@ -37,6 +39,7 @@ }, { "BriefDescription": "Clocks in the IRP", + "EventCode": "0x0", "EventName": "UNC_I_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Number of clocks in the IRP.", @@ -1401,6 +1404,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QP= I. Each flit contains 64b of data. This includes both DRS and NCB data fl= its (coherent and non-coherent). This can be used to calculate the data ba= ndwidth of the QPI link. One can get a good picture of the QPI-link charac= teristics by evaluating the protocol flits, data flits, and idle/null flits= . This does not include the header flits that go in data packets.", @@ -1409,6 +1413,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Non-Data protoco= l Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transm= itted across QPI. This basically tracks the protocol overhead on the QPI l= ink. One can get a good picture of the QPI-link characteristics by evaluat= ing the protocol flits, data flits, and idle/null flits. This includes the= header flits for data packets.", @@ -1417,6 +1422,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both = Header and Data)", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of flits transmitted over QPI on the DRS (Data Response) channel. = DRS flits are used to transmit data with coherency.", @@ -1425,6 +1431,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of data flits transmitted over QPI on the DRS (Data Response) chan= nel. DRS flits are used to transmit data with coherency. This does not co= unt data flits transmitted over the NCB channel which transmits non-coheren= t data. This includes only the data flits (not the header).", @@ -1433,6 +1440,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits= ", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of protocol flits transmitted over QPI on the DRS (Data Response) = channel. DRS flits are used to transmit data with coherency. This does no= t count data flits transmitted over the NCB channel which transmits non-coh= erent data. This includes only the header flits (not the data). This incl= udes extended headers.", @@ -1441,6 +1449,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of flits transmitted over QPI on the home channel.", @@ -1449,6 +1458,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request = Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of non-request flits transmitted over QPI on the home channel. These ar= e most commonly snoop responses, and this event can be used as a proxy for = that.", @@ -1457,6 +1467,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Request Flit= s", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of data request transmitted over QPI on the home channel. This basicall= y counts the number of remote memory requests transmitted over QPI. In con= junction with the local read count in the Home Agent, one can calculate the= number of LLC Misses.", @@ -1465,6 +1476,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; SNP Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.SNP", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of snoop request flits transmitted over QPI. These requests are contain= ed in the snoop channel. This does not include snoop responses, which are = transmitted on the home channel.", @@ -3136,6 +3148,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_S_CLOCKTICKS", "PerPkg": "1", "Unit": "SBOX" @@ -3823,6 +3836,7 @@ }, { "BriefDescription": "UNC_U_CLOCKTICKS", + "EventCode": "0x0", "EventName": "UNC_U_CLOCKTICKS", "PerPkg": "1", "Unit": "UBOX" diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json b/t= ools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json index c005f51..124c3ae 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-memory.json @@ -151,12 +151,14 @@ }, { "BriefDescription": "DRAM Clockticks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "Unit": "iMC" }, { "BriefDescription": "DRAM Clockticks", + "EventCode": "0x0", "EventName": "UNC_M_DCLOCKTICKS", "PerPkg": "1", "Unit": "iMC" diff --git a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json b/to= ols/perf/pmu-events/arch/x86/haswellx/uncore-power.json index daebf10..9276058 100644 --- a/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/haswellx/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 800 MHz clock. Thi= s event counts the number of pclk cycles measured while the counter was ena= bled. The pclk, like the Memory Controller's dclk, counts at a constant ra= te making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/p= erf/pmu-events/arch/x86/icelake/pipeline.json index 154fee4..0789412 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json @@ -193,6 +193,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the eight programmable counter= s available for other events. Note: On all current platforms this event sto= ps counting during 'throttling (TM)' states duty off periods the processor = is 'halted'. The counter update is done at a lower clock rate then the cor= e clock the overflow status bit for this counter may appear 'sticky'. Afte= r the counter has overflowed and software clears the overflow status bit an= d resets the counter to less than MAX. The reset value to the counter is not clocked immediately = so the overflow status bit will flip 'high (1)' and generate another PMI (i= f enabled) after which the reset value gets clocked into the counter. There= fore, software will get the interrupt, read the overflow status bit '1 for = bit 34 while the counter value is less than MAX. Software should ignore thi= s case.", "SampleAfterValue": "2000003", @@ -208,6 +209,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -359,6 +361,7 @@ }, { "BriefDescription": "Precise instruction retired event with a redu= ced effect of PEBS shadow in IP distribution", + "EventCode": "0x0", "EventName": "INST_RETIRED.PREC_DIST", "PEBS": "1", "PublicDescription": "A version of INST_RETIRED that allows for a = more unbiased distribution of samples across instructions retired. It utili= zes the Precise Distribution of Instructions Retired (PDIR) feature to miti= gate some bias in how retired instructions get sampled. Use on Fixed Counte= r 0.", @@ -562,6 +565,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/= perf/pmu-events/arch/x86/icelakex/pipeline.json index 442a4c7..9cfb341 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json @@ -193,6 +193,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the eight programmable counter= s available for other events. Note: On all current platforms this event sto= ps counting during 'throttling (TM)' states duty off periods the processor = is 'halted'. The counter update is done at a lower clock rate then the cor= e clock the overflow status bit for this counter may appear 'sticky'. Afte= r the counter has overflowed and software clears the overflow status bit an= d resets the counter to less than MAX. The reset value to the counter is not clocked immediately = so the overflow status bit will flip 'high (1)' and generate another PMI (i= f enabled) after which the reset value gets clocked into the counter. There= fore, software will get the interrupt, read the overflow status bit '1 for = bit 34 while the counter value is less than MAX. Software should ignore thi= s case.", "SampleAfterValue": "2000003", @@ -208,6 +209,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -359,6 +361,7 @@ }, { "BriefDescription": "Precise instruction retired event with a redu= ced effect of PEBS shadow in IP distribution", + "EventCode": "0x0", "EventName": "INST_RETIRED.PREC_DIST", "PEBS": "1", "PublicDescription": "A version of INST_RETIRED that allows for a = more unbiased distribution of samples across instructions retired. It utili= zes the Precise Distribution of Instructions Retired (PDIR) feature to miti= gate some bias in how retired instructions get sampled. Use on Fixed Counte= r 0.", @@ -544,6 +547,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json b/to= ols/perf/pmu-events/arch/x86/icelakex/uncore-cache.json index b6ce14e..ae57663 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-cache.json @@ -892,6 +892,7 @@ }, { "BriefDescription": "Clockticks of the uncore caching and home age= nt (CHA)", + "EventCode": "0x0", "EventName": "UNC_CHA_CLOCKTICKS", "PerPkg": "1", "Unit": "CHA" diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json index 8ac5907..1b821b6 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json @@ -1419,6 +1419,7 @@ }, { "BriefDescription": "Clockticks of the mesh to memory (M2M)", + "EventCode": "0x0", "EventName": "UNC_M2M_CLOCKTICKS", "PerPkg": "1", "Unit": "M2M" diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json b/t= ools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json index 814d959..b0b2f27 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-memory.json @@ -100,6 +100,7 @@ }, { "BriefDescription": "DRAM Clockticks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "Unit": "iMC" diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json b/to= ols/perf/pmu-events/arch/x86/icelakex/uncore-power.json index ee4dac6..9c4cd59 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Clockticks of the power control unit (PCU)", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Clockticks of the power control unit (PCU) := The PCU runs off a fixed 1 GHz clock. This event counts the number of pcl= k cycles measured while the counter was enabled. The pclk, like the Memory= Controller's dclk, counts at a constant rate making it a good measure of a= ctual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json b/tools= /perf/pmu-events/arch/x86/ivybridge/pipeline.json index 30a3da9..2df2d21 100644 --- a/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/ivybridge/pipeline.json @@ -326,6 +326,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3" @@ -348,6 +349,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -355,6 +357,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "PublicDescription": "Core cycles when at least one thread on the = physical core is not in halt state.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json b/tools/p= erf/pmu-events/arch/x86/ivytown/pipeline.json index 30a3da9..6f6f281 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/pipeline.json @@ -326,6 +326,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3" @@ -348,6 +349,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -355,6 +357,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "PublicDescription": "Core cycles when at least one thread on the = physical core is not in halt state.", "SampleAfterValue": "2000003", @@ -510,6 +513,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "SampleAfterValue": "2000003", "UMask": "0x1" diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json b/too= ls/perf/pmu-events/arch/x86/ivytown/uncore-cache.json index 8bf2706..31e58fb 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-cache.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CBOX" @@ -1533,6 +1534,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_H_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts the number of uclks in the HA. This = will be slightly different than the count in the Ubox because of enable/fre= eze delays. The HA is on the other side of the die from the fixed Ubox ucl= k counter, so the drift could be somewhat larger than in units that are clo= ser like the QPI Agent.", diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.jso= n b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json index ccf45153..f2492ec7 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-interconnect.json @@ -109,6 +109,7 @@ }, { "BriefDescription": "Clocks in the IRP", + "EventCode": "0x0", "EventName": "UNC_I_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Number of clocks in the IRP.", @@ -1522,6 +1523,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of data flits transmitted over QP= I. Each flit contains 64b of data. This includes both DRS and NCB data fl= its (coherent and non-coherent). This can be used to calculate the data ba= ndwidth of the QPI link. One can get a good picture of the QPI-link charac= teristics by evaluating the protocol flits, data flits, and idle/null flits= . This does not include the header flits that go in data packets.", @@ -1530,6 +1532,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Non-Data protoco= l Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach flit is made up of 80 bits of information (in addition to some ECC data= ). In full-width (L0) mode, flits are made up of four fits, each of which = contains 20 bits of data (along with some additional ECC data). In half-w= idth (L0p) mode, the fits are only 10 bits, and therefore it takes twice as= many fits to transmit a flit. When one talks about QPI speed (for example= , 8.0 GT/s), the transfers here refer to fits. Therefore, in L0, the syste= m will transfer 1 flit at the rate of 1/4th the QPI speed. One can calcula= te the bandwidth of the link by taking: flits*80b/time. Note that this is = not the same as data bandwidth. For example, when we are transferring a 64= B cacheline across QPI, we will break it into 9 flits -- 1 with header info= rmation and 8 with 64 bits of actual data and an additional 16 bits of othe= r information. To calc ulate data bandwidth, one should therefore do: data flits * 8B / time (for= L0) or 4B instead of 8B for L0p.; Number of non-NULL non-data flits transm= itted across QPI. This basically tracks the protocol overhead on the QPI l= ink. One can get a good picture of the QPI-link characteristics by evaluat= ing the protocol flits, data flits, and idle/null flits. This includes the= header flits for data packets.", @@ -1538,6 +1541,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both = Header and Data)", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of flits transmitted over QPI on the DRS (Data Response) channel. = DRS flits are used to transmit data with coherency.", @@ -1546,6 +1550,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of data flits transmitted over QPI on the DRS (Data Response) chan= nel. DRS flits are used to transmit data with coherency. This does not co= unt data flits transmitted over the NCB channel which transmits non-coheren= t data. This includes only the data flits (not the header).", @@ -1554,6 +1559,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits= ", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the tota= l number of protocol flits transmitted over QPI on the DRS (Data Response) = channel. DRS flits are used to transmit data with coherency. This does no= t count data flits transmitted over the NCB channel which transmits non-coh= erent data. This includes only the header flits (not the data). This incl= udes extended headers.", @@ -1562,6 +1568,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of flits transmitted over QPI on the home channel.", @@ -1570,6 +1577,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request = Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of non-request flits transmitted over QPI on the home channel. These ar= e most commonly snoop responses, and this event can be used as a proxy for = that.", @@ -1578,6 +1586,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Request Flit= s", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of data request transmitted over QPI on the home channel. This basicall= y counts the number of remote memory requests transmitted over QPI. In con= junction with the local read count in the Home Agent, one can calculate the= number of LLC Misses.", @@ -1586,6 +1595,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; SNP Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.SNP", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three groups that allow us to track flits. = It includes filters for SNP, HOM, and DRS message classes. Each flit is m= ade up of 80 bits of information (in addition to some ECC data). In full-w= idth (L0) mode, flits are made up of four fits, each of which contains 20 b= its of data (along with some additional ECC data). In half-width (L0p) mo= de, the fits are only 10 bits, and therefore it takes twice as many fits to= transmit a flit. When one talks about QPI speed (for example, 8.0 GT/s), = the transfers here refer to fits. Therefore, in L0, the system will transf= er 1 flit at the rate of 1/4th the QPI speed. One can calculate the bandwi= dth of the link by taking: flits*80b/time. Note that this is not the same = as data bandwidth. For example, when we are transferring a 64B cacheline a= cross QPI, we will break it into 9 flits -- 1 with header information and 8= with 64 bits of actual data and an additional 16 bits of other information. To calculate data b= andwidth, one should therefore do: data flits * 8B / time.; Counts the numb= er of snoop request flits transmitted over QPI. These requests are contain= ed in the snoop channel. This does not include snoop responses, which are = transmitted on the home channel.", @@ -3104,6 +3114,7 @@ }, { "EventName": "UNC_U_CLOCKTICKS", + "EventCode": "0x0", "PerPkg": "1", "Unit": "UBOX" }, diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json b/to= ols/perf/pmu-events/arch/x86/ivytown/uncore-memory.json index 6550934..869a320 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-memory.json @@ -131,6 +131,7 @@ }, { "BriefDescription": "DRAM Clockticks", + "EventCode": "0x0", "EventName": "UNC_M_DCLOCKTICKS", "PerPkg": "1", "Unit": "iMC" diff --git a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json b/too= ls/perf/pmu-events/arch/x86/ivytown/uncore-power.json index 5df1ebf..0a5d0c3 100644 --- a/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/ivytown/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 800 MHz clock. Thi= s event counts the number of pclk cycles measured while the counter was ena= bled. The pclk, like the Memory Controller's dclk, counts at a constant ra= te making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json b/tools/= perf/pmu-events/arch/x86/jaketown/pipeline.json index d0edfde..76b515d 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/pipeline.json @@ -329,6 +329,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK eve= nt. It is counted on a dedicated fixed counter, leaving the four (eight whe= n Hyperthreading is disabled) programmable counters available for other eve= nts.", "SampleAfterValue": "2000003", @@ -351,6 +352,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter, leaving the four (eight when Hyperthreading is disabled) programmable= counters available for other events.", "SampleAfterValue": "2000003", @@ -359,6 +361,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -432,6 +435,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json b/to= ols/perf/pmu-events/arch/x86/jaketown/uncore-cache.json index 63395e7e..160f1c4 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-cache.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CBOX" @@ -863,6 +864,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_H_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts the number of uclks in the HA. This = will be slightly different than the count in the Ubox because of enable/fre= eze delays. The HA is on the other side of the die from the fixed Ubox ucl= k counter, so the drift could be somewhat larger than in units that are clo= ser like the QPI Agent.", diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json index 874f15e..45f2966 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-interconnect.json @@ -109,6 +109,7 @@ }, { "BriefDescription": "Clocks in the IRP", + "EventCode": "0x0", "EventName": "UNC_I_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Number of clocks in the IRP.", @@ -847,6 +848,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Data Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach 'flit' is made up of 80 bits of information (in addition to some ECC da= ta). In full-width (L0) mode, flits are made up of four 'fits', each of wh= ich contains 20 bits of data (along with some additional ECC data). In ha= lf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twic= e as many fits to transmit a flit. When one talks about QPI 'speed' (for e= xample, 8.0 GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0,= the system will transfer 1 'flit' at the rate of 1/4th the QPI speed. One= can calculate the bandwidth of the link by taking: flits*80b/time. Note t= hat this is not the same as 'data' bandwidth. For example, when we are tra= nsferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 wi= th header information and 8 with 64 bits of actual 'data' and an additional= 16 bits of other infor mation. To calculate 'data' bandwidth, one should therefore do: data flit= s * 8B / time (for L0) or 4B instead of 8B for L0p.", @@ -855,6 +857,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Idle and Null Fl= its", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.IDLE", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach 'flit' is made up of 80 bits of information (in addition to some ECC da= ta). In full-width (L0) mode, flits are made up of four 'fits', each of wh= ich contains 20 bits of data (along with some additional ECC data). In ha= lf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twic= e as many fits to transmit a flit. When one talks about QPI 'speed' (for e= xample, 8.0 GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0,= the system will transfer 1 'flit' at the rate of 1/4th the QPI speed. One= can calculate the bandwidth of the link by taking: flits*80b/time. Note t= hat this is not the same as 'data' bandwidth. For example, when we are tra= nsferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 wi= th header information and 8 with 64 bits of actual 'data' and an additional= 16 bits of other infor mation. To calculate 'data' bandwidth, one should therefore do: data flit= s * 8B / time (for L0) or 4B instead of 8B for L0p.", @@ -863,6 +866,7 @@ }, { "BriefDescription": "Flits Transferred - Group 0; Non-Data protoco= l Tx Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G0.NON_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. It includes filters for Idle, protocol, and Data Flits. E= ach 'flit' is made up of 80 bits of information (in addition to some ECC da= ta). In full-width (L0) mode, flits are made up of four 'fits', each of wh= ich contains 20 bits of data (along with some additional ECC data). In ha= lf-width (L0p) mode, the fits are only 10 bits, and therefore it takes twic= e as many fits to transmit a flit. When one talks about QPI 'speed' (for e= xample, 8.0 GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0,= the system will transfer 1 'flit' at the rate of 1/4th the QPI speed. One= can calculate the bandwidth of the link by taking: flits*80b/time. Note t= hat this is not the same as 'data' bandwidth. For example, when we are tra= nsferring a 64B cacheline across QPI, we will break it into 9 flits -- 1 wi= th header information and 8 with 64 bits of actual 'data' and an additional= 16 bits of other infor mation. To calculate 'data' bandwidth, one should therefore do: data flit= s * 8B / time (for L0) or 4B instead of 8B for L0p.", @@ -871,6 +875,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Flits (both = Header and Data)", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -879,6 +884,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Data Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_DATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -887,6 +893,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; DRS Header Flits= ", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.DRS_NONDATA", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -895,6 +902,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -903,6 +911,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Non-Request = Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_NONREQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -911,6 +920,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; HOM Request Flit= s", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.HOM_REQ", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -919,6 +929,7 @@ }, { "BriefDescription": "Flits Transferred - Group 1; SNP Flits", + "EventCode": "0x0", "EventName": "UNC_Q_TxL_FLITS_G1.SNP", "PerPkg": "1", "PublicDescription": "Counts the number of flits transmitted acros= s the QPI Link. This is one of three 'groups' that allow us to track flits= . It includes filters for SNP, HOM, and DRS message classes. Each 'flit' = is made up of 80 bits of information (in addition to some ECC data). In fu= ll-width (L0) mode, flits are made up of four 'fits', each of which contain= s 20 bits of data (along with some additional ECC data). In half-width (L= 0p) mode, the fits are only 10 bits, and therefore it takes twice as many f= its to transmit a flit. When one talks about QPI 'speed' (for example, 8.0= GT/s), the 'transfers' here refer to 'fits'. Therefore, in L0, the system= will transfer 1 'flit' at the rate of 1/4th the QPI speed. One can calcul= ate the bandwidth of the link by taking: flits*80b/time. Note that this is= not the same as 'data' bandwidth. For example, when we are transferring a= 64B cacheline across QPI, we will break it into 9 flits -- 1 with header i= nformation and 8 with 6 4 bits of actual 'data' and an additional 16 bits of other information. T= o calculate 'data' bandwidth, one should therefore do: data flits * 8B / ti= me.", @@ -1576,6 +1587,7 @@ }, { "EventName": "UNC_U_CLOCKTICKS", + "EventCode": "0x0", "PerPkg": "1", "Unit": "UBOX" }, diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json b/t= ools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json index 6dcc9415..2385b0a 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-memory.json @@ -65,6 +65,7 @@ }, { "BriefDescription": "uclks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Uncore Fixed Counter - uclks", diff --git a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json b/to= ols/perf/pmu-events/arch/x86/jaketown/uncore-power.json index b3ee5d7..f453afd 100644 --- a/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/jaketown/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 800 MHz clock. Thi= s event counts the number of pclk cycles measured while the counter was ena= bled. The pclk, like the Memory Controller's dclk, counts at a constant ra= te making it a good measure of actual wall time.", @@ -216,6 +217,7 @@ }, { "BriefDescription": "Cycles spent changing Frequency", + "EventCode": "0x0", "EventName": "UNC_P_FREQ_TRANS_CYCLES", "PerPkg": "1", "PublicDescription": "Counts the number of cycles when the system = is changing frequency. This can not be filtered by thread ID. One can als= o use it with the occupancy counter that monitors number of threads in C0 t= o estimate the performance impact that frequency transitions had on the sys= tem.", diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json b/= tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json index 3dc5321..a74d45a 100644 --- a/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/knightslanding/pipeline.json @@ -150,12 +150,14 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3" }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter", "SampleAfterValue": "2000003", @@ -177,6 +179,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= that retire. For instructions that consist of multiple micro-ops, this ev= ent counts exactly once, as the last micro-op of the instruction retires. = The event continues counting while instructions retire, including during in= terrupt service routines caused by hardware interrupts, faults or traps.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.jso= n b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json index 1b8dcfa..c062253 100644 --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-cache.json @@ -3246,6 +3246,7 @@ }, { "BriefDescription": "Uncore Clocks", + "EventCode": "0x0", "EventName": "UNC_H_U_CLOCKTICKS", "PerPkg": "1", "Unit": "CHA" diff --git a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.js= on b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json index fb75297..3575baa 100644 --- a/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/knightslanding/uncore-memory.json @@ -41,6 +41,7 @@ }, { "BriefDescription": "ECLK count", + "EventCode": "0x0", "EventName": "UNC_E_E_CLOCKTICKS", "PerPkg": "1", "Unit": "EDC_ECLK" @@ -55,6 +56,7 @@ }, { "BriefDescription": "UCLK count", + "EventCode": "0x0", "EventName": "UNC_E_U_CLOCKTICKS", "PerPkg": "1", "Unit": "EDC_UCLK" @@ -93,12 +95,14 @@ }, { "BriefDescription": "DCLK count", + "EventCode": "0x0", "EventName": "UNC_M_D_CLOCKTICKS", "PerPkg": "1", "Unit": "iMC_DCLK" }, { "BriefDescription": "UCLK count", + "EventCode": "0x0", "EventName": "UNC_M_U_CLOCKTICKS", "PerPkg": "1", "Unit": "iMC_UCLK" diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/pipeline.json index 6397894..0de3572 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json @@ -37,6 +37,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "SampleAfterValue": "2000003", "UMask": "0x2", @@ -51,6 +52,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3", @@ -58,6 +60,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately=20 so the overflow status bit will flip 'high (1)' and generate another PMI (= if enabled) after which the reset value gets clocked into the counter. Ther= efore, software will get the interrupt, read the overflow status bit '1 for= bit 34 while the counter value is less than MAX. Software should ignore th= is case.", "SampleAfterValue": "2000003", @@ -75,6 +78,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "SampleAfterValue": "2000003", "UMask": "0x2", @@ -82,6 +86,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -105,6 +110,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "SampleAfterValue": "2000003", @@ -113,6 +119,7 @@ }, { "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", @@ -157,6 +164,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json b/too= ls/perf/pmu-events/arch/x86/sandybridge/pipeline.json index ecaf94c..973a5f4 100644 --- a/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/sandybridge/pipeline.json @@ -337,6 +337,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "This event counts the number of reference cy= cles when the core is not in a halt state. The core enters the halt state w= hen it is running the HLT instruction or the MWAIT instruction. This event = is not affected by core frequency changes (for example, P states, TM2 trans= itions) but has the same incrementing frequency as the time stamp counter. = This event can approximate elapsed time while the core was not in a halt st= ate. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK eve= nt. It is counted on a dedicated fixed counter, leaving the four (eight whe= n Hyperthreading is disabled) programmable counters available for other eve= nts.", "SampleAfterValue": "2000003", @@ -359,6 +360,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "This event counts the number of core cycles = while the thread is not in a halt state. The thread enters the halt state w= hen it is running the HLT instruction. This event is a component in many ke= y event ratios. The core frequency may change from time to time due to tran= sitions associated with Enhanced Intel SpeedStep Technology or TM2. For thi= s reason this event may have a changing ratio with regards to time. When th= e core frequency is constant, this event can approximate elapsed time while= the core was not in the halt state. It is counted on a dedicated fixed cou= nter, leaving the four (eight when Hyperthreading is disabled) programmable= counters available for other events.", "SampleAfterValue": "2000003", @@ -367,6 +369,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -440,6 +443,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= retired from execution. For instructions that consist of multiple micro-op= s, this event counts the retirement of the last micro-op of the instruction= . Counting continues during hardware interrupts, traps, and inside interrup= t handlers.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/= tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json index 72e9bdfa..ada2c34 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json @@ -284,6 +284,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. It is = counted on a dedicated fixed counter, leaving the eight programmable counte= rs available for other events. Note: On all current platforms this event st= ops counting during 'throttling (TM)' states duty off periods the processor= is 'halted'. The counter update is done at a lower clock rate then the co= re clock the overflow status bit for this counter may appear 'sticky'. Aft= er the counter has overflowed and software clears the overflow status bit a= nd resets the counter to less than MAX. The reset value to the counter is n= ot clocked immediately=20 so the overflow status bit will flip 'high (1)' and generate another PMI (= if enabled) after which the reset value gets clocked into the counter. Ther= efore, software will get the interrupt, read the overflow status bit '1 for= bit 34 while the counter value is less than MAX. Software should ignore th= is case.", "SampleAfterValue": "2000003", @@ -299,6 +300,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -426,6 +428,7 @@ }, { "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", @@ -457,6 +460,7 @@ }, { "BriefDescription": "Precise instruction retired with PEBS precise= -distribution", + "EventCode": "0x0", "EventName": "INST_RETIRED.PREC_DIST", "PEBS": "1", "PublicDescription": "A version of INST_RETIRED that allows for a = precise distribution of samples across instructions retired. It utilizes th= e Precise Distribution of Instructions Retired (PDIR++) feature to fix bias= in how retired instructions get sampled. Use on Fixed Counter 0.", @@ -719,6 +723,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", diff --git a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json b/to= ols/perf/pmu-events/arch/x86/sierraforest/pipeline.json index 4121295..67be689 100644 --- a/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/sierraforest/pipeline.json @@ -17,6 +17,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -29,6 +30,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "SampleAfterValue": "2000003", "UMask": "0x3" @@ -43,6 +45,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -55,6 +58,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json b/tool= s/perf/pmu-events/arch/x86/silvermont/pipeline.json index 2d4214b..6423c01 100644 --- a/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/silvermont/pipeline.json @@ -143,6 +143,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.CORE", "PublicDescription": "Counts the number of core cycles while the c= ore is not in a halt state. The core enters the halt state when it is runni= ng the HLT instruction. This event is a component in many key event ratios.= The core frequency may change from time to time. For this reason this eve= nt may have a changing ratio with regards to time. In systems with a consta= nt core frequency, this event can give you a measurement of the elapsed tim= e while the core was not in halt state by dividing the event count by the c= ore frequency. This event is architecturally defined and is a designated fi= xed counter. CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P use the cor= e frequency which may change from time to time. CPU_CLK_UNHALTE.REF_TSC an= d CPU_CLK_UNHALTED.REF are not affected by core frequency changes but count= s as if the core is running at the maximum frequency all the time. The fix= ed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the pr= ogrammable events are C PU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.", "SampleAfterValue": "2000003", @@ -165,6 +166,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles while = the core is not in a halt state. The core enters the halt state when it is = running the HLT instruction. This event is a component in many key event ra= tios. The core frequency may change from time. This event is not affected = by core frequency changes but counts as if the core is running at the maxim= um frequency all the time. Divide this event count by core frequency to de= termine the elapsed time while the core was not in halt state. Divide this= event count by core frequency to determine the elapsed time while the core= was not in halt state. This event is architecturally defined and is a des= ignated fixed counter. CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.CORE_P u= se the core frequency which may change from time to time. CPU_CLK_UNHALTE.= REF_TSC and CPU_CLK_UNHALTED.REF are not affected by core frequency changes= but counts as if the core is running at the maximum frequency all the time= . The fixed events are CPU_CLK_UNHALTED.CORE and CPU_CLK_UNHALTED.REF_TSC and the programmable e= vents are CPU_CLK_UNHALTED.CORE_P and CPU_CLK_UNHALTED.REF.", "SampleAfterValue": "2000003", @@ -180,6 +182,7 @@ }, { "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "This event counts the number of instructions= that retire. For instructions that consist of multiple micro-ops, this ev= ent counts exactly once, as the last micro-op of the instruction retires. = The event continues counting while instructions retire, including during in= terrupt service routines caused by hardware interrupts, faults or traps. B= ackground: Modern microprocessors employ extensive pipelining and speculati= ve techniques. Since sometimes an instruction is started but never complet= ed, the notion of \"retirement\" is introduced. A retired instruction is o= ne that commits its states. Or stated differently, an instruction might be = abandoned at some point. No instruction is truly finished until it retires.= This counter measures the number of completed instructions. The fixed ev= ent is INST_RETIRED.ANY and the programmable event is INST_RETIRED.ANY_P.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/tools/p= erf/pmu-events/arch/x86/skylake/pipeline.json index 2dfc3af..53f1381 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json @@ -182,6 +182,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the four (eight when Hyperthre= ading is disabled) programmable counters available for other events. Note: = On all current platforms this event stops counting during 'throttling (TM)'= states duty off periods the processor is 'halted'. The counter update is = done at a lower clock rate then the core clock the overflow status bit for = this counter may appear 'sticky'. After the counter has overflowed and sof= tware clears the overfl ow status bit and resets the counter to less than MAX. The reset value to = the counter is not clocked immediately so the overflow status bit will flip= 'high (1)' and generate another PMI (if enabled) after which the reset val= ue gets clocked into the counter. Therefore, software will get the interrup= t, read the overflow status bit '1 for bit 34 while the counter value is le= ss than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -213,6 +214,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the four (eight when Hyperthreading is disabled) programmable counters a= vailable for other events.", "SampleAfterValue": "2000003", @@ -221,6 +223,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -360,6 +363,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "Counts the number of instructions retired fr= om execution. For instructions that consist of multiple micro-ops, Counts t= he retirement of the last micro-op of the instruction. Counting continues d= uring hardware interrupts, traps, and inside interrupt handlers. Notes: INS= T_RETIRED.ANY is counted by a designated fixed counter, leaving the four (e= ight when Hyperthreading is disabled) programmable counters available for o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter and it= is an architectural performance event. Counting: Faulting executions of GE= TSEC/VM entry/VM Exit/MWait will not count as retired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/= perf/pmu-events/arch/x86/skylakex/pipeline.json index 0f06e31..99346e1 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json @@ -191,6 +191,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the four (eight when Hyperthre= ading is disabled) programmable counters available for other events. Note: = On all current platforms this event stops counting during 'throttling (TM)'= states duty off periods the processor is 'halted'. The counter update is = done at a lower clock rate then the core clock the overflow status bit for = this counter may appear 'sticky'. After the counter has overflowed and sof= tware clears the overfl ow status bit and resets the counter to less than MAX. The reset value to = the counter is not clocked immediately so the overflow status bit will flip= 'high (1)' and generate another PMI (if enabled) after which the reset val= ue gets clocked into the counter. Therefore, software will get the interrup= t, read the overflow status bit '1 for bit 34 while the counter value is le= ss than MAX. Software should ignore this case.", "SampleAfterValue": "2000003", @@ -222,6 +223,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the four (eight when Hyperthreading is disabled) programmable counters a= vailable for other events.", "SampleAfterValue": "2000003", @@ -230,6 +232,7 @@ { "AnyThread": "1", "BriefDescription": "Core cycles when at least one thread on the p= hysical core is not in halt state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD_ANY", "SampleAfterValue": "2000003", "UMask": "0x2" @@ -369,6 +372,7 @@ }, { "BriefDescription": "Instructions retired from execution.", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PublicDescription": "Counts the number of instructions retired fr= om execution. For instructions that consist of multiple micro-ops, Counts t= he retirement of the last micro-op of the instruction. Counting continues d= uring hardware interrupts, traps, and inside interrupt handlers. Notes: INS= T_RETIRED.ANY is counted by a designated fixed counter, leaving the four (e= ight when Hyperthreading is disabled) programmable counters available for o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter and it= is an architectural performance event. Counting: Faulting executions of GE= TSEC/VM entry/VM Exit/MWait will not count as retired instructions.", "SampleAfterValue": "2000003", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json b/to= ols/perf/pmu-events/arch/x86/skylakex/uncore-cache.json index 543dfc1..4df1294 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-cache.json @@ -460,6 +460,7 @@ }, { "BriefDescription": "Clockticks of the uncore caching & home agent= (CHA)", + "EventCode": "0x0", "EventName": "UNC_CHA_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts clockticks of the clock controlling t= he uncore caching and home agent (CHA).", @@ -5678,6 +5679,7 @@ { "BriefDescription": "This event is deprecated. Refer to new event = UNC_CHA_CLOCKTICKS", "Deprecated": "1", + "EventCode": "0x0", "EventName": "UNC_C_CLOCKTICKS", "PerPkg": "1", "Unit": "CHA" diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json index 26a5a20..40f609c 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json @@ -1090,6 +1090,7 @@ }, { "BriefDescription": "Cycles - at UCLK", + "EventCode": "0x0", "EventName": "UNC_M2M_CLOCKTICKS", "PerPkg": "1", "Unit": "M2M" diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json b/tools= /perf/pmu-events/arch/x86/skylakex/uncore-io.json index 2a3a709..21a6a0f 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-io.json @@ -1271,6 +1271,7 @@ }, { "BriefDescription": "Counting disabled", + "EventCode": "0x0", "EventName": "UNC_IIO_NOTHING", "PerPkg": "1", "Unit": "IIO" diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/t= ools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json index 6f8ff22..a7ce916 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json @@ -167,6 +167,7 @@ }, { "BriefDescription": "Memory controller clock ticks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Counts clockticks of the fixed frequency clo= ck of the memory controller using one of the programmable counters.", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json b/to= ols/perf/pmu-events/arch/x86/skylakex/uncore-power.json index c6254af..a01b279 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "pclk Cycles", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "The PCU runs off a fixed 1 GHz clock. This = event counts the number of pclk cycles measured while the counter was enabl= ed. The pclk, like the Memory Controller's dclk, counts at a constant rate= making it a good measure of actual wall time.", diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json b/tool= s/perf/pmu-events/arch/x86/snowridgex/pipeline.json index 9dd8c90..3388cd5 100644 --- a/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/snowridgex/pipeline.json @@ -150,6 +150,7 @@ }, { "BriefDescription": "Counts the number of unhalted reference clock= cycles at TSC frequency. (Fixed event)", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. This event is not affected by core frequency ch= anges and increments at a fixed frequency that is also used for the Time St= amp Counter (TSC). This event uses fixed counter 2.", "SampleAfterValue": "2000003", @@ -179,6 +180,7 @@ }, { "BriefDescription": "Counts the total number of instructions retir= ed. (Fixed event)", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the total number of instructions that= retired. For instructions that consist of multiple uops, this event counts= the retirement of the last uop of the instruction. This event continues co= unting during hardware interrupts, traps, and inside interrupt handlers. Th= is event uses fixed counter 0.", diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json b/= tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json index a68a5bb..279381b 100644 --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-cache.json @@ -872,6 +872,7 @@ }, { "BriefDescription": "Uncore cache clock ticks", + "EventCode": "0x0", "EventName": "UNC_CHA_CLOCKTICKS", "PerPkg": "1", "Unit": "CHA" diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json index de38400..399536f 100644 --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-interconnect.json @@ -1419,6 +1419,7 @@ }, { "BriefDescription": "Clockticks of the mesh to memory (M2M)", + "EventCode": "0x0", "EventName": "UNC_M2M_CLOCKTICKS", "PerPkg": "1", "Unit": "M2M" diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json b= /tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json index 530e9b71..b24ba35 100644 --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-memory.json @@ -120,6 +120,7 @@ }, { "BriefDescription": "Memory controller clock ticks", + "EventCode": "0x0", "EventName": "UNC_M_CLOCKTICKS", "PerPkg": "1", "PublicDescription": "Clockticks of the integrated memory controll= er (IMC)", diff --git a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json b/= tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json index 27fc155..5c04d6e 100644 --- a/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json +++ b/tools/perf/pmu-events/arch/x86/snowridgex/uncore-power.json @@ -1,6 +1,7 @@ [ { "BriefDescription": "Clockticks of the power control unit (PCU)", + "EventCode": "0x0", "EventName": "UNC_P_CLOCKTICKS", "PerPkg": "1", "Unit": "PCU" diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/tigerlake/pipeline.json index a0aeeb8..54a81f9 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json @@ -193,6 +193,7 @@ }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.REF_TSC", "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the eight programmable counter= s available for other events. Note: On all current platforms this event sto= ps counting during 'throttling (TM)' states duty off periods the processor = is 'halted'. The counter update is done at a lower clock rate then the cor= e clock the overflow status bit for this counter may appear 'sticky'. Afte= r the counter has overflowed and software clears the overflow status bit an= d resets the counter to less than MAX. The reset value to the counter is not clocked immediately = so the overflow status bit will flip 'high (1)' and generate another PMI (i= f enabled) after which the reset value gets clocked into the counter. There= fore, software will get the interrupt, read the overflow status bit '1 for = bit 34 while the counter value is less than MAX. Software should ignore thi= s case.", "SampleAfterValue": "2000003", @@ -208,6 +209,7 @@ }, { "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventCode": "0x0", "EventName": "CPU_CLK_UNHALTED.THREAD", "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", "SampleAfterValue": "2000003", @@ -352,6 +354,7 @@ }, { "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventCode": "0x0", "EventName": "INST_RETIRED.ANY", "PEBS": "1", "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", @@ -377,6 +380,7 @@ }, { "BriefDescription": "Precise instruction retired event with a redu= ced effect of PEBS shadow in IP distribution", + "EventCode": "0x0", "EventName": "INST_RETIRED.PREC_DIST", "PEBS": "1", "PublicDescription": "A version of INST_RETIRED that allows for a = more unbiased distribution of samples across instructions retired. It utili= zes the Precise Distribution of Instructions Retired (PDIR) feature to miti= gate some bias in how retired instructions get sampled. Use on Fixed Counte= r 0.", @@ -569,6 +573,7 @@ }, { "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventCode": "0x0", "EventName": "TOPDOWN.SLOTS", "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", "SampleAfterValue": "10000003", --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8741EE49AE for ; Mon, 21 Aug 2023 08:36:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234037AbjHUIgn (ORCPT ); Mon, 21 Aug 2023 04:36:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234043AbjHUIgg (ORCPT ); Mon, 21 Aug 2023 04:36:36 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 445EABB; Mon, 21 Aug 2023 01:36:34 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045170;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8D0_1692606989; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8D0_1692606989) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:30 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 4/8] perf jevents: Support more event fields Date: Mon, 21 Aug 2023 16:36:13 +0800 Message-Id: <1692606977-92009-5-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The previous code assumes an event has either an "event=3D" or "config" field at the beginning. For CMN neither of these may be present, as an event is typically "type=3Dxx,eventid=3Dxxx". If EventCode and ConfigCode is not added in the alias JSON file, the event description will add "event=3D0" by default. So, even if the event field is added "eventid=3Dxxx" and "type=3Dxxx", the CMN events final parsing result will be "event=3D0, eventid=3Dxxx, type=3Dxxx". Therefore, when EventCode and ConfigCode are missing in JSON, "event=3D0" is no longer added by default. And add EventIdCode and Type to the event field. I compared pmu_event.c before and after compiling with JEVENT_ARCH=3Dall, they are consistent. Signed-off-by: Jing Zhang --- tools/perf/pmu-events/jevents.py | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jeven= ts.py index f57a8f2..369c8bf 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -275,11 +275,14 @@ class JsonEvent: } return table[unit] if unit in table else f'uncore_{unit.lower()}' =20 - eventcode =3D 0 + eventcode =3D None if 'EventCode' in jd: eventcode =3D int(jd['EventCode'].split(',', 1)[0], 0) if 'ExtSel' in jd: - eventcode |=3D int(jd['ExtSel']) << 8 + if eventcode is None: + eventcode =3D int(jd['ExtSel']) << 8 + else: + eventcode |=3D int(jd['ExtSel']) << 8 configcode =3D int(jd['ConfigCode'], 0) if 'ConfigCode' in jd else None self.name =3D jd['EventName'].lower() if 'EventName' in jd else None self.topic =3D '' @@ -317,7 +320,11 @@ class JsonEvent: if precise and self.desc and '(Precise Event)' not in self.desc: extra_desc +=3D ' (Must be precise)' if precise =3D=3D '2' else (' (= Precise ' 'event)') - event =3D f'config=3D{llx(configcode)}' if configcode is not None else= f'event=3D{llx(eventcode)}' + event =3D None + if eventcode is not None: + event =3D f'event=3D{llx(eventcode)}' + elif configcode is not None: + event =3D f'config=3D{llx(configcode)}' event_fields =3D [ ('AnyThread', 'any=3D'), ('PortMask', 'ch_mask=3D'), @@ -327,10 +334,15 @@ class JsonEvent: ('Invert', 'inv=3D'), ('SampleAfterValue', 'period=3D'), ('UMask', 'umask=3D'), + ('NodeType', 'type=3D'), + ('EventIdCode', 'eventid=3D'), ] for key, value in event_fields: if key in jd and jd[key] !=3D '0': - event +=3D ',' + value + jd[key] + if event: + event +=3D ',' + value + jd[key] + else: + event =3D value + jd[key] if filter: event +=3D f',{filter}' if msr: --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C6A0EE49B0 for ; Mon, 21 Aug 2023 08:36:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234096AbjHUIgq (ORCPT ); Mon, 21 Aug 2023 04:36:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234051AbjHUIgl (ORCPT ); Mon, 21 Aug 2023 04:36:41 -0400 Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9035C1; Mon, 21 Aug 2023 01:36:35 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045168;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8DN_1692606990; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8DN_1692606990) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:31 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 5/8] perf test: Make matching_pmu effective Date: Mon, 21 Aug 2023 16:36:14 +0800 Message-Id: <1692606977-92009-6-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The perf_pmu_test_event.matching_pmu didn't work. No matter what its value is, it does not affect the test results. So let matching_pmu be used for matching perf_pmu_test_pmu.pmu.name. Signed-off-by: Jing Zhang Reviewed-by: John Garry --- tools/perf/tests/pmu-events.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index 1dff863b..3204252 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -238,7 +238,7 @@ struct perf_pmu_test_pmu { }, .alias_str =3D "event=3D0x2b", .alias_long_desc =3D "ddr write-cycles event. Unit: uncore_sys_ddr_pmu ", - .matching_pmu =3D "uncore_sys_ddr_pmu", + .matching_pmu =3D "uncore_sys_ddr_pmu0", }; =20 static const struct perf_pmu_test_event sys_ccn_pmu_read_cycles =3D { @@ -252,7 +252,7 @@ struct perf_pmu_test_pmu { }, .alias_str =3D "config=3D0x2c", .alias_long_desc =3D "ccn read-cycles event. Unit: uncore_sys_ccn_pmu ", - .matching_pmu =3D "uncore_sys_ccn_pmu", + .matching_pmu =3D "uncore_sys_ccn_pmu4", }; =20 static const struct perf_pmu_test_event *sys_events[] =3D { @@ -599,6 +599,11 @@ static int __test_uncore_pmu_event_aliases(struct perf= _pmu_test_pmu *test_pmu) struct pmu_event const *event =3D &test_event->event; =20 if (!strcmp(event->name, alias->name)) { + if (strcmp(pmu_name, test_event->matching_pmu)) { + pr_debug("testing aliases uncore PMU %s: mismatched matching_pmu, %s = vs %s\n", + pmu_name, test_event->matching_pmu, pmu_name); + continue; + } if (compare_alias_to_test_event(alias, test_event, pmu_name)) { --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7B52EE49A5 for ; Mon, 21 Aug 2023 08:36:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234083AbjHUIgo (ORCPT ); Mon, 21 Aug 2023 04:36:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234050AbjHUIgj (ORCPT ); Mon, 21 Aug 2023 04:36:39 -0400 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D59EC2; Mon, 21 Aug 2023 01:36:35 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R751e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8Dh_1692606991; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8Dh_1692606991) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:32 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 6/8] perf test: Add pmu-event test for "Compat" and new event_field. Date: Mon, 21 Aug 2023 16:36:15 +0800 Message-Id: <1692606977-92009-7-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add new event test for uncore system event which is used to verify the functionality of "Compat" matching multiple identifiers and the new event fields "EventIdCode" and "Type". Signed-off-by: Jing Zhang --- .../pmu-events/arch/test/test_soc/sys/uncore.json | 8 ++++ tools/perf/pmu-events/empty-pmu-events.c | 8 ++++ tools/perf/tests/pmu-events.c | 55 ++++++++++++++++++= ++++ 3 files changed, 71 insertions(+) diff --git a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json b/too= ls/perf/pmu-events/arch/test/test_soc/sys/uncore.json index c7e7528..06b886d 100644 --- a/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json +++ b/tools/perf/pmu-events/arch/test/test_soc/sys/uncore.json @@ -12,5 +12,13 @@ "EventName": "sys_ccn_pmu.read_cycles", "Unit": "sys_ccn_pmu", "Compat": "0x01" + }, + { + "BriefDescription": "Counts total cache misses in first lookup = result (high priority).", + "NodeType": "0x05", + "EventIdCode": "0x01", + "EventName": "sys_cmn_pmu.hnf_cache_miss", + "Unit": "sys_cmn_pmu", + "Compat": "434*;436*;43c*;43a01" } ] diff --git a/tools/perf/pmu-events/empty-pmu-events.c b/tools/perf/pmu-even= ts/empty-pmu-events.c index e74defb..25be18a 100644 --- a/tools/perf/pmu-events/empty-pmu-events.c +++ b/tools/perf/pmu-events/empty-pmu-events.c @@ -245,6 +245,14 @@ struct pmu_events_map { .pmu =3D "uncore_sys_ccn_pmu", }, { + .name =3D "sys_cmn_pmu.hnf_cache_miss", + .event =3D "type=3D0x05,eventid=3D0x01", + .desc =3D "Counts total cache misses in first lookup result (high priori= ty). Unit: uncore_sys_cmn_pmu ", + .compat =3D "434*;436*;43c*;43a01", + .topic =3D "uncore", + .pmu =3D "uncore_sys_cmn_pmu", + }, + { .name =3D 0, .event =3D 0, .desc =3D 0, diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index 3204252..79fb3e2 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -255,9 +255,24 @@ struct perf_pmu_test_pmu { .matching_pmu =3D "uncore_sys_ccn_pmu4", }; =20 +static const struct perf_pmu_test_event sys_cmn_pmu_hnf_cache_miss =3D { + .event =3D { + .name =3D "sys_cmn_pmu.hnf_cache_miss", + .event =3D "type=3D0x05,eventid=3D0x01", + .desc =3D "Counts total cache misses in first lookup result (high priori= ty). Unit: uncore_sys_cmn_pmu ", + .topic =3D "uncore", + .pmu =3D "uncore_sys_cmn_pmu", + .compat =3D "434*;436*;43c*;43a01", + }, + .alias_str =3D "type=3D0x5,eventid=3D0x1", + .alias_long_desc =3D "Counts total cache misses in first lookup result (h= igh priority). Unit: uncore_sys_cmn_pmu ", + .matching_pmu =3D "uncore_sys_cmn_pmu0", +}; + static const struct perf_pmu_test_event *sys_events[] =3D { &sys_ddr_pmu_write_cycles, &sys_ccn_pmu_read_cycles, + &sys_cmn_pmu_hnf_cache_miss, NULL }; =20 @@ -704,6 +719,46 @@ static int __test_uncore_pmu_event_aliases(struct perf= _pmu_test_pmu *test_pmu) &sys_ccn_pmu_read_cycles, }, }, + { + .pmu =3D { + .name =3D (char *)"uncore_sys_cmn_pmu0", + .is_uncore =3D 1, + .id =3D (char *)"43401", + }, + .aliases =3D { + &sys_cmn_pmu_hnf_cache_miss, + }, + }, + { + .pmu =3D { + .name =3D (char *)"uncore_sys_cmn_pmu0", + .is_uncore =3D 1, + .id =3D (char *)"43602", + }, + .aliases =3D { + &sys_cmn_pmu_hnf_cache_miss, + }, + }, + { + .pmu =3D { + .name =3D (char *)"uncore_sys_cmn_pmu0", + .is_uncore =3D 1, + .id =3D (char *)"43c03", + }, + .aliases =3D { + &sys_cmn_pmu_hnf_cache_miss, + }, + }, + { + .pmu =3D { + .name =3D (char *)"uncore_sys_cmn_pmu0", + .is_uncore =3D 1, + .id =3D (char *)"43a01", + }, + .aliases =3D { + &sys_cmn_pmu_hnf_cache_miss, + }, + } }; =20 /* Test that aliases generated are as expected */ --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FA73EE49A5 for ; Mon, 21 Aug 2023 08:36:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234118AbjHUIgv (ORCPT ); Mon, 21 Aug 2023 04:36:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234030AbjHUIgm (ORCPT ); Mon, 21 Aug 2023 04:36:42 -0400 Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7BB8C9; Mon, 21 Aug 2023 01:36:37 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8E2_1692606993; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8E2_1692606993) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:33 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 7/8] perf jevents: Add support for Arm CMN PMU aliasing Date: Mon, 21 Aug 2023 16:36:16 +0800 Message-Id: <1692606977-92009-8-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently just add aliases for part of Arm CMN PMU events which are general and compatible for any SoC and CMN-ANY. "Compat" value "434*;436*;43c*;43a*" means it is compatible with all CMN600/CMN650/CMN700/Ci700, which can be obtained from commit 7819e05a0dce ("perf/arm-cmn: Revamp model detection"). The arm-cmn PMU events got from: [0] https://developer.arm.com/documentation/100180/0302/?lang=3Den [1] https://developer.arm.com/documentation/101408/0100/?lang=3Den [2] https://developer.arm.com/documentation/102308/0302/?lang=3Den [3] https://developer.arm.com/documentation/101569/0300/?lang=3Den Signed-off-by: Jing Zhang Reviewed-by: John Garry --- .../pmu-events/arch/arm64/arm/cmn/sys/cmn.json | 266 +++++++++++++++++= ++++ tools/perf/pmu-events/jevents.py | 1 + 2 files changed, 267 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json b/tools/= perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json new file mode 100644 index 0000000..30435a3 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/cmn.json @@ -0,0 +1,266 @@ +[ + { + "EventName": "hnf_cache_miss", + "EventIdCode": "0x1", + "NodeType": "0x5", + "BriefDescription": "Counts total cache misses in first lookup result (h= igh priority).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_slc_sf_cache_access", + "EventIdCode": "0x2", + "NodeType": "0x5", + "BriefDescription": "Counts number of cache accesses in first access (hi= gh priority).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_cache_fill", + "EventIdCode": "0x3", + "NodeType": "0x5", + "BriefDescription": "Counts total allocations in HN SLC (all cache line = allocations to SLC).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_pocq_retry", + "EventIdCode": "0x4", + "NodeType": "0x5", + "BriefDescription": "Counts number of retried requests.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_pocq_reqs_recvd", + "EventIdCode": "0x5", + "NodeType": "0x5", + "BriefDescription": "Counts number of requests that HN receives.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_sf_hit", + "EventIdCode": "0x6", + "NodeType": "0x5", + "BriefDescription": "Counts number of SF hits.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_sf_evictions", + "EventIdCode": "0x7", + "NodeType": "0x5", + "BriefDescription": "Counts number of SF eviction cache invalidations in= itiated.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_dir_snoops_sent", + "EventIdCode": "0x8", + "NodeType": "0x5", + "BriefDescription": "Counts number of directed snoops sent (not includin= g SF back invalidation).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_brd_snoops_sent", + "EventIdCode": "0x9", + "NodeType": "0x5", + "BriefDescription": "Counts number of multicast snoops sent (not includi= ng SF back invalidation).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_slc_eviction", + "EventIdCode": "0xa", + "NodeType": "0x5", + "BriefDescription": "Counts number of SLC evictions (dirty only).", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_slc_fill_invalid_way", + "EventIdCode": "0xb", + "NodeType": "0x5", + "BriefDescription": "Counts number of SLC fills to an invalid way.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_mc_retries", + "EventIdCode": "0xc", + "NodeType": "0x5", + "BriefDescription": "Counts number of retried transactions by the MC.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_mc_reqs", + "EventIdCode": "0xd", + "NodeType": "0x5", + "BriefDescription": "Counts number of requests that are sent to MC.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hnf_qos_hh_retry", + "EventIdCode": "0xe", + "NodeType": "0x5", + "BriefDescription": "Counts number of times a HighHigh priority request = is protocolretried at the HN=E2=80=91F.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_s0_rdata_beats", + "EventIdCode": "0x1", + "NodeType": "0xa", + "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatche= d on port 0. This event measures the read bandwidth, including CMO response= s.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_s1_rdata_beats", + "EventIdCode": "0x2", + "NodeType": "0xa", + "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatche= d on port 1. This event measures the read bandwidth, including CMO response= s.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_s2_rdata_beats", + "EventIdCode": "0x3", + "NodeType": "0xa", + "BriefDescription": "Number of RData beats (RVALID and RREADY) dispatche= d on port 2. This event measures the read bandwidth, including CMO response= s.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_rxdat_flits", + "EventIdCode": "0x4", + "NodeType": "0xa", + "BriefDescription": "Number of RXDAT flits received. This event measures= the true read data bandwidth, excluding CMOs.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_txdat_flits", + "EventIdCode": "0x5", + "NodeType": "0xa", + "BriefDescription": "Number of TXDAT flits dispatched. This event measur= es the write bandwidth.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_txreq_flits_total", + "EventIdCode": "0x6", + "NodeType": "0xa", + "BriefDescription": "Number of TXREQ flits dispatched. This event measur= es the total request bandwidth.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "rnid_txreq_flits_retried", + "EventIdCode": "0x7", + "NodeType": "0xa", + "BriefDescription": "Number of retried TXREQ flits dispatched. This even= t measures the retry rate.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "sbsx_txrsp_retryack", + "EventIdCode": "0x4", + "NodeType": "0x7", + "BriefDescription": "Number of RXREQ flits dispatched. This event is a m= easure of the retry rate.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "sbsx_txdat_flitv", + "EventIdCode": "0x5", + "NodeType": "0x7", + "BriefDescription": "Number of TXDAT flits dispatched from XP to SBSX. T= his event is a measure of the write bandwidth.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "sbsx_arvalid_no_arready", + "EventIdCode": "0x21", + "NodeType": "0x7", + "BriefDescription": "Number of cycles the SBSX bridge is stalled because= of backpressure on AR channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "sbsx_awvalid_no_awready", + "EventIdCode": "0x22", + "NodeType": "0x7", + "BriefDescription": "Number of cycles the SBSX bridge is stalled because= of backpressure on AW channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "sbsx_wvalid_no_wready", + "EventIdCode": "0x23", + "NodeType": "0x7", + "BriefDescription": "Number of cycles the SBSX bridge is stalled because= of backpressure on W channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_txrsp_retryack", + "EventIdCode": "0x2a", + "NodeType": "0x4", + "BriefDescription": "Number of RXREQ flits dispatched. This event is a m= easure of the retry rate.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_arvalid_no_arready", + "EventIdCode": "0x2b", + "NodeType": "0x4", + "BriefDescription": "Number of cycles the HN-I bridge is stalled because= of backpressure on AR channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_arready_no_arvalid", + "EventIdCode": "0x2c", + "NodeType": "0x4", + "BriefDescription": "Number of cycles the AR channel is waiting for new = requests from HN-I bridge.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_awvalid_no_awready", + "EventIdCode": "0x2d", + "NodeType": "0x4", + "BriefDescription": "Number of cycles the HN-I bridge is stalled because= of backpressure on AW channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_awready_no_awvalid", + "EventIdCode": "0x2e", + "NodeType": "0x4", + "BriefDescription": "Number of cycles the AW channel is waiting for new = requests from HN-I bridge.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_wvalid_no_wready", + "EventIdCode": "0x2f", + "NodeType": "0x4", + "BriefDescription": "Number of cycles the HN-I bridge is stalled because= of backpressure on W channel.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "EventName": "hni_txdat_stall", + "EventIdCode": "0x30", + "NodeType": "0x4", + "BriefDescription": "TXDAT valid but no link credit available.", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + } +] diff --git a/tools/perf/pmu-events/jevents.py b/tools/perf/pmu-events/jeven= ts.py index 369c8bf..935bd4b 100755 --- a/tools/perf/pmu-events/jevents.py +++ b/tools/perf/pmu-events/jevents.py @@ -272,6 +272,7 @@ class JsonEvent: 'DFPMC': 'amd_df', 'cpu_core': 'cpu_core', 'cpu_atom': 'cpu_atom', + 'arm_cmn': 'arm_cmn', } return table[unit] if unit in table else f'uncore_{unit.lower()}' =20 --=20 1.8.3.1 From nobody Wed Dec 17 12:43:45 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE06AEE49AC for ; Mon, 21 Aug 2023 08:36:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234123AbjHUIgw (ORCPT ); Mon, 21 Aug 2023 04:36:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234077AbjHUIgn (ORCPT ); Mon, 21 Aug 2023 04:36:43 -0400 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A127DB; Mon, 21 Aug 2023 01:36:38 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=renyu.zj@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VqEq8EK_1692606994; Received: from srmbuffer011165236051.sqa.net(mailfrom:renyu.zj@linux.alibaba.com fp:SMTPD_---0VqEq8EK_1692606994) by smtp.aliyun-inc.com; Mon, 21 Aug 2023 16:36:34 +0800 From: Jing Zhang To: John Garry , Ian Rogers Cc: Will Deacon , James Clark , Arnaldo Carvalho de Melo , Mark Rutland , Mike Leach , Leo Yan , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Alexander Shishkin , Jiri Olsa , Adrian Hunter , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, linux-doc@vger.kernel.org, Zhuo Song , Jing Zhang , Shuai Xue Subject: [PATCH v7 8/8] perf vendor events: Add JSON metrics for Arm CMN Date: Mon, 21 Aug 2023 16:36:17 +0800 Message-Id: <1692606977-92009-9-git-send-email-renyu.zj@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> References: <1692606977-92009-1-git-send-email-renyu.zj@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add JSON metrics for Arm CMN. Currently just add part of CMN PMU metrics which are general and compatible for any SoC with CMN-ANY. Signed-off-by: Jing Zhang Reviewed-by: John Garry --- .../pmu-events/arch/arm64/arm/cmn/sys/metric.json | 74 ++++++++++++++++++= ++++ 1 file changed, 74 insertions(+) create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json diff --git a/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json b/too= ls/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json new file mode 100644 index 0000000..64db534 --- /dev/null +++ b/tools/perf/pmu-events/arch/arm64/arm/cmn/sys/metric.json @@ -0,0 +1,74 @@ +[ + { + "MetricName": "slc_miss_rate", + "BriefDescription": "The system level cache miss rate.", + "MetricGroup": "cmn", + "MetricExpr": "hnf_cache_miss / hnf_slc_sf_cache_access", + "ScaleUnit": "100%", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "hnf_message_retry_rate", + "BriefDescription": "HN-F message retry rate indicates whether a lack of= credits is causing the bottlenecks.", + "MetricGroup": "cmn", + "MetricExpr": "hnf_pocq_retry / hnf_pocq_reqs_recvd", + "ScaleUnit": "100%", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "sf_hit_rate", + "BriefDescription": "Snoop filter hit rate can be used to measure the sn= oop filter efficiency.", + "MetricGroup": "cmn", + "MetricExpr": "hnf_sf_hit / hnf_slc_sf_cache_access", + "ScaleUnit": "100%", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "mc_message_retry_rate", + "BriefDescription": "The memory controller request retries rate indicate= s whether the memory controller is the bottleneck.", + "MetricGroup": "cmn", + "MetricExpr": "hnf_mc_retries / hnf_mc_reqs", + "ScaleUnit": "100%", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "rni_actual_read_bandwidth.all", + "BriefDescription": "This event measure the actual bandwidth that RN-I b= ridge sends to the interconnect.", + "MetricGroup": "cmn", + "MetricExpr": "rnid_rxdat_flits * 32 / 1e6 / duration_time", + "ScaleUnit": "1MB/s", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "rni_actual_write_bandwidth.all", + "BriefDescription": "This event measures the actual write bandwidth at R= N-I bridges.", + "MetricGroup": "cmn", + "MetricExpr": "rnid_txdat_flits * 32 / 1e6 / duration_time", + "ScaleUnit": "1MB/s", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "rni_retry_rate", + "BriefDescription": "RN-I bridge retry rate indicates whether the memory= controller is the bottleneck.", + "MetricGroup": "cmn", + "MetricExpr": "rnid_txreq_flits_retried / rnid_txreq_flits_total", + "ScaleUnit": "100%", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + }, + { + "MetricName": "sbsx_actual_write_bandwidth.all", + "BriefDescription": "sbsx actual write bandwidth.", + "MetricGroup": "cmn", + "MetricExpr": "sbsx_txdat_flitv * 32 / 1e6 / duration_time", + "ScaleUnit": "1MB/s", + "Unit": "arm_cmn", + "Compat": "434*;436*;43c*;43a*" + } +] --=20 1.8.3.1