From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD7AAC001B0 for ; Fri, 23 Jun 2023 15:11:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232557AbjFWPLa (ORCPT ); Fri, 23 Jun 2023 11:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232415AbjFWPLL (ORCPT ); Fri, 23 Jun 2023 11:11:11 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C4482697 for ; Fri, 23 Jun 2023 08:10:37 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-565d1b86a64so10511377b3.3 for ; Fri, 23 Jun 2023 08:10:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533037; x=1690125037; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=esSkQJg/VtdJhfubiyHaspvmh3pr2CsZdqhGyqi+VGI=; b=BbtAh9ac5DF4dqJL+tKfDgcikhys23/kTQxU8GAGBRYk4uDZu/SGEaUFqKvpKn5Ax4 6rIZvIiK1/a9bj+C/5spEM4Bn7Y685KQMzQzFjtpNfeBGApxVPHDoAXPKLCR1C0wKbMp AKiKAiAdJPyqVsmU9vrQOKsqD7lAknecyC18SUDi2LvHzDdh+QFtPQGwReRaY0kbBuQA +fVii9PMuzaWMY8Fyl+df7m7uzWGkEqQBANSExhDcDwK8VgvFC5NKAnUvnWUm5S/r7A6 8rncO5aD2Tk9nMNKxAUKB5Smbiu2JYD6ZzptVfdATqSwQ7W26wwYR+u2w0C9/q2GoEZ4 +meA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533037; x=1690125037; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=esSkQJg/VtdJhfubiyHaspvmh3pr2CsZdqhGyqi+VGI=; b=CmJ6WYubxTxJoTy0t7CtES4ZiA9RbP8us6PZEQWH8puEXZju7HGLbHBo3Z97sKJlfZ ZCtXMtbAJSg6NpqhNNIML73CRCkGY49MthqrVj2OFoCRVwT2KpCLUfiVB/uK8soEf/cj gywWoVTFmPNw1LGyymSlciocFPBQP2M/jVIO8HRBmV8L3VBEmL1hh6fPmH3Yj0UxDBLL 7q+47aSF564LIkmoTGB/O+J0spVk0DaesbeZkStv9uzTE0mgyNiE6FwER+l6Kr3L1AJb 0KJJD36ykmbggQ6Ll7cf1yZ95LSd8KquTRtPNA7z/oILSJPLfgg0R8CVTKOf7yRhuvRD dSjQ== X-Gm-Message-State: AC+VfDws36nsGIJx1G69eedBIIwc5uwGUa1WMjeRITUmbWvmPoUzuN87 Rt8EAC9cJgjFUm3H9gihi3SERTOAyA4B X-Google-Smtp-Source: ACHHUZ75MDw7mn4sXn5Uq8VKlTWFWHet5eyvj4ZZaU8jC/oEc/FXMcrQPiz3MQFCesfkpsISK/dijwhlaQqH X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a81:4507:0:b0:568:c4ea:ce66 with SMTP id s7-20020a814507000000b00568c4eace66mr9956723ywa.5.1687533036750; Fri, 23 Jun 2023 08:10:36 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:05 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-2-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 01/12] perf expr: Add has_event function From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some events are dependent on firmware/kernel enablement. Allow such events to be detected when the metric is parsed so that the metric's event parsing doesn't fail. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/tests/expr.c | 4 ++++ tools/perf/util/expr.c | 21 +++++++++++++++++++++ tools/perf/util/expr.h | 1 + tools/perf/util/expr.l | 1 + tools/perf/util/expr.y | 8 +++++++- 5 files changed, 34 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c index 3d01eb5e2512..c1c3fcbc2753 100644 --- a/tools/perf/tests/expr.c +++ b/tools/perf/tests/expr.c @@ -254,6 +254,10 @@ static int test__expr(struct test_suite *t __maybe_unu= sed, int subtest __maybe_u TEST_ASSERT_VAL("source count", hashmap__size(ctx->ids) =3D=3D 1); TEST_ASSERT_VAL("source count", hashmap__find(ctx->ids, "EVENT1", &val_pt= r)); =20 + /* has_event returns 1 when an event exists. */ + expr__add_id_val(ctx, strdup("cycles"), 2); + ret =3D test(ctx, "has_event(cycles)", 1); + expr__ctx_free(ctx); =20 return 0; diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c index f4e52919324e..4814262e3805 100644 --- a/tools/perf/util/expr.c +++ b/tools/perf/util/expr.c @@ -8,6 +8,7 @@ #include "cpumap.h" #include "cputopo.h" #include "debug.h" +#include "evlist.h" #include "expr.h" #include "expr-bison.h" #include "expr-flex.h" @@ -474,3 +475,23 @@ double expr__get_literal(const char *literal, const st= ruct expr_scanner_ctx *ctx pr_debug2("literal: %s =3D %f\n", literal, result); return result; } + +/* Does the event 'id' parse? Determine via ctx->ids if possible. */ +double expr__has_event(const struct expr_parse_ctx *ctx, bool compute_ids,= const char *id) +{ + struct evlist *tmp; + double ret; + + if (hashmap__find(ctx->ids, id, /*value=3D*/NULL)) + return 1.0; + + if (!compute_ids) + return 0.0; + + tmp =3D evlist__new(); + if (!tmp) + return NAN; + ret =3D parse_event(tmp, id) ? 0 : 1; + evlist__delete(tmp); + return ret; +} diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h index eaa44b24c555..3c1e49b3e35d 100644 --- a/tools/perf/util/expr.h +++ b/tools/perf/util/expr.h @@ -54,5 +54,6 @@ int expr__find_ids(const char *expr, const char *one, double expr_id_data__value(const struct expr_id_data *data); double expr_id_data__source_count(const struct expr_id_data *data); double expr__get_literal(const char *literal, const struct expr_scanner_ct= x *ctx); +double expr__has_event(const struct expr_parse_ctx *ctx, bool compute_ids,= const char *id); =20 #endif diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l index 4fbf353e78e7..dbb117414710 100644 --- a/tools/perf/util/expr.l +++ b/tools/perf/util/expr.l @@ -113,6 +113,7 @@ min { return MIN; } if { return IF; } else { return ELSE; } source_count { return SOURCE_COUNT; } +has_event { return HAS_EVENT; } {literal} { return literal(yyscanner, sctx); } {number} { return value(yyscanner); } {symbol} { return str(yyscanner, ID, sctx->runtime); } diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y index f04963eb6be0..dd504afd8f36 100644 --- a/tools/perf/util/expr.y +++ b/tools/perf/util/expr.y @@ -37,7 +37,7 @@ } ids; } =20 -%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT EXPR_ERROR +%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT HAS_EVENT EX= PR_ERROR %left MIN MAX IF %left '|' %left '^' @@ -199,6 +199,12 @@ expr: NUMBER } | ID { $$ =3D handle_id(ctx, $1, compute_ids, /*source_count=3D*/false)= ; } | SOURCE_COUNT '(' ID ')' { $$ =3D handle_id(ctx, $3, compute_ids, /*sourc= e_count=3D*/true); } +| HAS_EVENT '(' ID ')' +{ + $$.val =3D expr__has_event(ctx, compute_ids, $3); + $$.ids =3D NULL; + free($3); +} | expr '|' expr { if (is_const($1.val) && is_const($3.val)) { --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14FCBEB64D7 for ; Fri, 23 Jun 2023 15:11:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232573AbjFWPLe (ORCPT ); Fri, 23 Jun 2023 11:11:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232444AbjFWPLN (ORCPT ); Fri, 23 Jun 2023 11:11:13 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 177212959 for ; Fri, 23 Jun 2023 08:10:41 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bfec07e5eb0so1152809276.2 for ; Fri, 23 Jun 2023 08:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533039; x=1690125039; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=LbVKkaPWXAVNKNh1CW8R2oY532WwWiXP5nT3QKlBBnU=; b=vU5EMJOqgJyjuf6oMjP3eYqV5oZcVODZ2DycPDXEAR3DL4Uh3QlyZvy3EVx3Qfl10j 1Y66GEKJEORy4XU2Ls2sXcoZw/hhvMoqjh8PoWCQGLDyoSejBP0fkcxUxOuFtN+FTGrh gWveHqtpFlKeruJfUPow/9m8m4rXSz5Z+Hr3UFl0VsohIZ1tVx581ZHeCsRKn+NYveYA KBk0wNh2IxFdeCmBY3i9yXpQ6o0569eVjqLgyCMHTCggK39f3Bl9tiu99mQiOVKApm4W Efoci3DdXd1iy2RsAmT/br3/843FEfpR/G/9QO3QQxABTu53SoHdWCVqvF6v2o5yXRFU 4LIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533039; x=1690125039; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LbVKkaPWXAVNKNh1CW8R2oY532WwWiXP5nT3QKlBBnU=; b=P+kYU6a8FynH0ZtejkCMfVC4BXEdnM5mWEgP9Xel2PC1Nl8fjsMoMD/1WWTEmGwFeW i8/zrwMuZ96Vn9pUZCIy9rUqtzz29loOIxO3kAG63mqbN2mQVq2QJwripf3RntlPLR27 QWTvmIaIL8D0Td1ddPwk04WzYgBxxjn2JUCBBtXiB+sPoqBpJPOKvY55nAykat2cDEdv WMZnP8NCfGpzWhFLQA9c04h14fbi4lYXbNW6cnGT1QDbkUV3wR4uhlKdouVkGlXkwCiT yPFLO4Y1hh1cHAfV9OzDPkVxy2pv58JBMNP9s2PkrtDNPIgJKFpP/OaKhSLGymc9MRzn t1Mw== X-Gm-Message-State: AC+VfDyy2swz8L5FU6yrEnaUlXBBqX+3Pe8ppSMfEZ92l0ZYFODvU/qx rNYlZCJkFa1VLrh1ursEYr5J+fpDVJZg X-Google-Smtp-Source: ACHHUZ7Oj7PYMfj8umh+RnQOhnMUR+dcjSbU75/Urxl0UyXXlV6mPll8kv0KpVxT3g6YZTk5ZWmaJR+aRI3h X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a25:d658:0:b0:bbb:8c13:ce26 with SMTP id n85-20020a25d658000000b00bbb8c13ce26mr9861677ybg.11.1687533039349; Fri, 23 Jun 2023 08:10:39 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:06 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-3-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 02/12] perf jevents: Support for has_event function From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Support for the new has_event function in metrics. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/metric.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric= .py index af58b74d1644..85a3545f5b6a 100644 --- a/tools/perf/pmu-events/metric.py +++ b/tools/perf/pmu-events/metric.py @@ -408,6 +408,12 @@ def source_count(event: Event) -> Function: return Function('source_count', event) =20 =20 +def has_event(event: Event) -> Function: + # pylint: disable=3Dredefined-builtin + # pylint: disable=3Dinvalid-name + return Function('has_event', event) + + class Metric: """An individual metric that will specifiable on the perf command line."= "" groups: Set[str] @@ -539,7 +545,7 @@ def ParsePerfJson(orig: str) -> Expression: r'Event(r"\1")', py) py =3D re.sub(r'#Event\(r"([^"]*)"\)', r'Literal("#\1")', py) py =3D re.sub(r'([0-9]+)Event\(r"(e[0-9]+)"\)', r'\1\2', py) - keywords =3D ['if', 'else', 'min', 'max', 'd_ratio', 'source_count'] + keywords =3D ['if', 'else', 'min', 'max', 'd_ratio', 'source_count', 'ha= s_event'] for kw in keywords: py =3D re.sub(rf'Event\(r"{kw}"\)', kw, py) =20 --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6412EB64D7 for ; Fri, 23 Jun 2023 15:11:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232502AbjFWPLr (ORCPT ); Fri, 23 Jun 2023 11:11:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232565AbjFWPLQ (ORCPT ); Fri, 23 Jun 2023 11:11:16 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F3342968 for ; Fri, 23 Jun 2023 08:10:44 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bfee66a6398so1034319276.3 for ; Fri, 23 Jun 2023 08:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533041; x=1690125041; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :from:to:cc:subject:date:message-id:reply-to; bh=hhUg6THmJuFZLCBhVQKBQ1G3lLVIgP2+QL9ajb1oVfg=; b=4LKAIGIR0ByppIaL+SDso681SQfZ0dG++4PcdXUmOsuXwKVieXpQY/jcXBMgCTKX0d yT7BVheSkV2wRLdQLLFlS6XjZeM4GIjtxOdxhjt9bbDo0fZAYiEbIYKJkQYeHTn8Bh+P Ye+P6Z+STSk538uGmZgpvBWw2xbTb1YswzWUDPrnnLE27+derA3ZU52iPY4k2leVTsxu BBNUTJv9Upzn1PhvWQ1WymzSEtLF+qUVHnxrqzH4Mi4Q6JeugFIchP2d9wrEYgv7NlHC gWh97QvAi44/WB81ir+lQegPF5SO03WOpCShJQQ/y8JPL2FJvtH7x/ojTdv2k+mRh9BO ym2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533041; x=1690125041; h=to:from:subject:references:mime-version:message-id:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hhUg6THmJuFZLCBhVQKBQ1G3lLVIgP2+QL9ajb1oVfg=; b=aaYOvFD46UzHfmYz1KJm76X7DqgWV9Zpr3erlFLjVomQlJoK4Zq/SfCsyrL83owNZD WxJLEAjXA0L0EyWrZPmf7ogdjHqWmqjhBEip7uok8RZ9uNw26X2xrN6zmC1AxGQvssdG +m9mUBDIhZKnPwAq1cv93yCFlXXrAtwrq5k/GZCUsr3uEIEnFU2SLBMVvjmOYsfuY8Im B9eDwdFJqO5FOYNpcfa1lDUcqKWgiNIPDpcuwdCFs2b4YFXS0zJSRnTCWA58sUKgtFeS rzWD8+GzjLR0jo+3cv7LEUwqj/y20q1RmVk7aSJPLnY5D/LXWWVn6JSB4m8uZtM10CaD gwog== X-Gm-Message-State: AC+VfDx60sTnOO++ch1C/Bg6dkFjrm81ZCXUy6MiWMxuNNkdoXK4UFwt NRXSKHr2gx4sMJFIJmVXm/ZUJbpUaBwx X-Google-Smtp-Source: ACHHUZ5Hpooc38mLYoeS/vfMhbYXfwZL+BNwGyvMTiBMVY2GNwoGlAnzO7gF+DTwQKSWCjGZOGFReyt5IiSM X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a25:aa4b:0:b0:bec:9efe:1027 with SMTP id s69-20020a25aa4b000000b00bec9efe1027mr8803240ybi.8.1687533041529; Fri, 23 Jun 2023 08:10:41 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:07 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-4-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 03/12] perf vendor metrics intel: Make transaction metrics conditional From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make the transaction metrics conditional on the cycles-tx event being present. This event may not be present when TSX extensions have been disabled. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json | 8 ++++---- .../pmu-events/arch/x86/cascadelakex/clx-metrics.json | 8 ++++---- tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json | 8 ++++---- tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json | 8 ++++---- .../pmu-events/arch/x86/sapphirerapids/spr-metrics.json | 8 ++++---- tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json | 8 ++++---- tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json | 8 ++++---- tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json | 8 ++++---- 8 files changed, 32 insertions(+), 32 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index 85fb975b6f56..daf9458f0b77 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -92,28 +92,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json b= /tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json index 0e2e446ced7a..fbb111e40829 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json @@ -1830,28 +1830,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index cc4edf855064..8fcc05c4e0a1 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -1516,28 +1516,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index 6f25b5b7aaf6..9bb7e3f20f7f 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -1812,28 +1812,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index c732982f70b5..c207c851a9f9 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -1938,28 +1938,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json b/tool= s/perf/pmu-events/arch/x86/skylake/skl-metrics.json index 2ed88842b880..94cb38540b5a 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json @@ -1466,28 +1466,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json b/too= ls/perf/pmu-events/arch/x86/skylakex/skx-metrics.json index 507d39efacc8..fa4209809c57 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json @@ -1774,28 +1774,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index 83346911aa63..c7c2d6ab1a93 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -1530,28 +1530,28 @@ }, { "BriefDescription": "Percentage of cycles in aborted transactions.= ", - "MetricExpr": "max(cpu@cycles\\-t@ - cpu@cycles\\-ct@, 0) / cycles= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", "MetricGroup": "transaction", "MetricName": "tsx_aborted_cycles", "ScaleUnit": "100%" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@el\\-start@", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_elision", "ScaleUnit": "1cycles / elision" }, { "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", - "MetricExpr": "cpu@cycles\\-t@ / cpu@tx\\-start@", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", "MetricGroup": "transaction", "MetricName": "tsx_cycles_per_transaction", "ScaleUnit": "1cycles / transaction" }, { "BriefDescription": "Percentage of cycles within a transaction reg= ion.", - "MetricExpr": "cpu@cycles\\-t@ / cycles", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", "MetricGroup": "transaction", "MetricName": "tsx_transactional_cycles", "ScaleUnit": "100%" --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7B46EB64D7 for ; Fri, 23 Jun 2023 15:11:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232240AbjFWPLv (ORCPT ); Fri, 23 Jun 2023 11:11:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232245AbjFWPLR (ORCPT ); Fri, 23 Jun 2023 11:11:17 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF662D51 for ; Fri, 23 Jun 2023 08:10:45 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-66872889417so338988b3a.1 for ; Fri, 23 Jun 2023 08:10:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533044; x=1690125044; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=0+Um1tF9FSQdUoamJQd/sqVSIoCCImKhekb+n80xvS8=; b=y4My8xshl1rjzLfqow3ojLgYzCqnGjp5wOMtF7ZyRLS+zsAZU31bz6Nanl7lLHyumY ocmNDwNBKy5CvSKCORqHgTQivtYvDl7/DQrRBNzVZB9Zz5e9t2LlR9r9/xCrY6g8sy0l DbdaxZGk8Rjt+7aDLscHMPpnGiSxK4LMwihJZlSDYBtYBf9Zzdmdsx72mYvpQKMZMZ5y AcXKb7CF0esh7ZvkhJ1T66lrrN6f1Xmez3CCu38T4bQe9Uun/9goVwc13mPAwvBycyX+ VXvjkZGY5TXstdnxBfGYatYTrwzu0gVe9V83LF7vQY7KUoOd0tM5mB2Bc/OiGwGnCs39 Rbjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533044; x=1690125044; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=0+Um1tF9FSQdUoamJQd/sqVSIoCCImKhekb+n80xvS8=; b=JUHOb1fiWp9pcF80/OEaYuaNWkYMUCZIJdWbJmc1lMq4IcRoyM8YAF+csTFt+VACPJ VSKJdnDgNPetn7uXwFb8oRdSlaOqdXzE8Ifk+txw1iGMMkF+fgYwvyMLBx0+fg/jdk51 T8aNPDmeCpCng0JDkS58s6SJYLFm0bdOYgVzuaxpuk8T/uH4pkp3aaeMUvgQCd+LTJfo LZ+uxwUphVO59oUG9rcaMTKF8s9McOTayUM5QF1nzpMmvgu9MZlPfdgq9ivqfYVF+wgL xxaJ66mrICafACV2Thczn3FHZ/NZF5f5PwtLBEloADrPv0O+q83peQTWOcnA0m4i0WQm Xd1Q== X-Gm-Message-State: AC+VfDyebBM4afzkXwP/dHqTd7zRiDN3kM0TAyZipGFxfriFjjh/fJUM zqsE7V8hfrTX2aj6Pvh3ZdzKmWoxKVdj X-Google-Smtp-Source: ACHHUZ4gfk0mCI7Kk7mLMNdQc8U2Xax5UdAggmT8eu1IpDR8pO0GFv8fS7Hs+7TwlnHkUfC9s/kBup2RN0UR X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a05:6a00:2386:b0:668:70cd:5b37 with SMTP id f6-20020a056a00238600b0066870cd5b37mr5091784pfc.5.1687533043858; Fri, 23 Jun 2023 08:10:43 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:08 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-5-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 04/12] perf vendor events intel: Add rocketlake events/metrics From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add RocketLake events added to Intel perfmon in: https://github.com/intel/perfmon/commit/f5f47dc938d81d5cc491cf8ac1f90bee18e= 238cf mapfile.csv is updated accordingly with the CPUID matching one previously associated with icelake. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 3 +- .../pmu-events/arch/x86/rocketlake/cache.json | 894 ++++++++++ .../arch/x86/rocketlake/floating-point.json | 105 ++ .../arch/x86/rocketlake/frontend.json | 377 ++++ .../arch/x86/rocketlake/memory.json | 394 +++++ .../arch/x86/rocketlake/metricgroups.json | 113 ++ .../pmu-events/arch/x86/rocketlake/other.json | 242 +++ .../arch/x86/rocketlake/pipeline.json | 801 +++++++++ .../arch/x86/rocketlake/rkl-metrics.json | 1571 +++++++++++++++++ .../x86/rocketlake/uncore-interconnect.json | 74 + .../arch/x86/rocketlake/uncore-other.json | 9 + .../arch/x86/rocketlake/virtual-memory.json | 165 ++ 12 files changed, 4747 insertions(+), 1 deletion(-) create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/floating-poin= t.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/frontend.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.= json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/other.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.j= son create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/uncore-interc= onnect.json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.= json create mode 100644 tools/perf/pmu-events/arch/x86/rocketlake/virtual-memor= y.json diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 6543a68d4a17..a30f2483d99e 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -13,7 +13,7 @@ GenuineIntel-6-B6,v1.00,grandridge,core GenuineIntel-6-A[DE],v1.01,graniterapids,core GenuineIntel-6-(3C|45|46),v33,haswell,core GenuineIntel-6-3F,v27,haswellx,core -GenuineIntel-6-(7D|7E|A7),v1.18,icelake,core +GenuineIntel-6-7[DE],v1.18,icelake,core GenuineIntel-6-6[AC],v1.20,icelakex,core GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core @@ -22,6 +22,7 @@ GenuineIntel-6-(57|85),v10,knightslanding,core GenuineIntel-6-A[AC],v1.01,meteorlake,core GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core +GenuineIntel-6-A7,v1.01,rocketlake,core GenuineIntel-6-2A,v19,sandybridge,core GenuineIntel-6-(8F|CF),v1.13,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/cache.json b/tools/p= erf/pmu-events/arch/x86/rocketlake/cache.json new file mode 100644 index 000000000000..b0f54a6650fe --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/cache.json @@ -0,0 +1,894 @@ +[ + { + "BriefDescription": "Counts the number of cache lines replaced in = L1 data cache.", + "EventCode": "0x51", + "EventName": "L1D.REPLACEMENT", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of phases a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D due to lack of L2 resources.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.L2_STALL", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D due to lack of L2 resources. Demand requests include cac= heable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of L1D misses that are outstanding", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles with L1D load Misses outstanding.", + "CounterMask": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING_CYCLES", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "L2 cache lines filling L2", + "EventCode": "0xF1", + "EventName": "L2_LINES_IN.ALL", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", + "SampleAfterValue": "100003", + "UMask": "0x1f" + }, + { + "BriefDescription": "Modified cache lines that are evicted by L2 c= ache when triggered by an L2 cache fill.", + "EventCode": "0xF2", + "EventName": "L2_LINES_OUT.NON_SILENT", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Non-modified cache lines that are silently dr= opped by L2 cache when triggered by an L2 cache fill.", + "EventCode": "0xF2", + "EventName": "L2_LINES_OUT.SILENT", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache when triggered by an L2 cache fill. These lines are ty= pically in Shared or Exclusive state. A non-threaded event.", + "SampleAfterValue": "200003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cache lines that have been L2 hardware prefet= ched but not used by demand accesses", + "EventCode": "0xf2", + "EventName": "L2_LINES_OUT.USELESS_HWPF", + "PublicDescription": "Counts the number of cache lines that have b= een prefetched by the L2 hardware prefetcher but not used by demand access = when evicted from the L2 cache", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "L2 code requests", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_CODE_RD", + "PublicDescription": "Counts the total number of L2 code requests.= ", + "SampleAfterValue": "200003", + "UMask": "0xe4" + }, + { + "BriefDescription": "Demand Data Read requests", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", + "PublicDescription": "Counts the number of demand Data Read reques= ts (including requests from L1D hardware prefetchers). These loads may hit = or miss L2 cache. Only non rejected loads are counted.", + "SampleAfterValue": "200003", + "UMask": "0xe1" + }, + { + "BriefDescription": "Demand requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_MISS", + "PublicDescription": "Counts demand requests that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x27" + }, + { + "BriefDescription": "Demand requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", + "PublicDescription": "Counts demand requests to L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xe7" + }, + { + "BriefDescription": "RFO requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_RFO", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", + "SampleAfterValue": "200003", + "UMask": "0xe2" + }, + { + "BriefDescription": "L2 cache hits when fetching instructions, cod= e reads.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_HIT", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", + "SampleAfterValue": "200003", + "UMask": "0xc4" + }, + { + "BriefDescription": "L2 cache misses when fetching instructions", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_MISS", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", + "SampleAfterValue": "200003", + "UMask": "0x24" + }, + { + "BriefDescription": "Demand Data Read requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc1" + }, + { + "BriefDescription": "Demand Data Read miss L2, no rejects", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", + "PublicDescription": "Counts the number of demand Data Read reques= ts that miss L2 cache. Only not rejected loads are counted.", + "SampleAfterValue": "200003", + "UMask": "0x21" + }, + { + "BriefDescription": "All requests that miss L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.MISS", + "PublicDescription": "Counts all requests that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x3f" + }, + { + "BriefDescription": "All L2 requests.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.REFERENCES", + "PublicDescription": "Counts all L2 requests.", + "SampleAfterValue": "200003", + "UMask": "0xff" + }, + { + "BriefDescription": "RFO requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_HIT", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc2" + }, + { + "BriefDescription": "RFO requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_MISS", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x22" + }, + { + "BriefDescription": "SW prefetch requests that hit L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.SWPF_HIT", + "PublicDescription": "Counts Software prefetch requests that hit t= he L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when = FB is not full.", + "SampleAfterValue": "200003", + "UMask": "0xc8" + }, + { + "BriefDescription": "SW prefetch requests that miss L2 cache.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.SWPF_MISS", + "PublicDescription": "Counts Software prefetch requests that miss = the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when= FB is not full.", + "SampleAfterValue": "200003", + "UMask": "0x28" + }, + { + "BriefDescription": "L2 writebacks that access L2 cache", + "EventCode": "0xF0", + "EventName": "L2_TRANS.L2_WB", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x40" + }, + { + "BriefDescription": "Core-originated cacheable requests that misse= d L3 (Except hardware prefetches to the L3)", + "EventCode": "0x2e", + "EventName": "LONGEST_LAT_CACHE.MISS", + "PublicDescription": "Counts core-originated cacheable requests th= at miss the L3 cache (Longest Latency cache). Requests include data and cod= e reads, Reads-for-Ownership (RFOs), speculative accesses and hardware pref= etches to the L1 and L2. It does not include hardware prefetches to the L3= , and may not count other types of requests to the L3.", + "SampleAfterValue": "100003", + "UMask": "0x41" + }, + { + "BriefDescription": "Retired load instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ALL_LOADS", + "PEBS": "1", + "PublicDescription": "Counts all retired load instructions. This e= vent accounts for SW prefetch instructions of PREFETCHNTA or PREFETCHT0/1/2= or PREFETCHW.", + "SampleAfterValue": "1000003", + "UMask": "0x81" + }, + { + "BriefDescription": "Retired store instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ALL_STORES", + "PEBS": "1", + "PublicDescription": "Counts all retired store instructions.", + "SampleAfterValue": "1000003", + "UMask": "0x82" + }, + { + "BriefDescription": "All retired memory instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores.", + "SampleAfterValue": "1000003", + "UMask": "0x83" + }, + { + "BriefDescription": "Retired load instructions with locked access.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.LOCK_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with locked= access.", + "SampleAfterValue": "100007", + "UMask": "0x21" + }, + { + "BriefDescription": "Retired load instructions that split across a= cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x41" + }, + { + "BriefDescription": "Retired store instructions that split across = a cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_STORES", + "PEBS": "1", + "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x42" + }, + { + "BriefDescription": "Retired load instructions that miss the STLB.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", + "PEBS": "1", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x11" + }, + { + "BriefDescription": "Retired store instructions that miss the STLB= .", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", + "PEBS": "1", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x12" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 and cross-core snoop hits in on-pkg core cache", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were HitM responses from shared L3", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3.", + "SampleAfterValue": "20011", + "UMask": "0x4" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 hit and cross-core snoop missed in on-pkg core cache.", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", + "PEBS": "1", + "PublicDescription": "Counts the retired load instructions whose d= ata sources were L3 hit and cross-core snoop missed in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were hits in L3 without snoops required", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were hits in L3 without snoops required.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Retired instructions with at least 1 uncachea= ble load or Bus Lock.", + "Data_LA": "1", + "EventCode": "0xd4", + "EventName": "MEM_LOAD_MISC_RETIRED.UC", + "PEBS": "1", + "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock).", + "SampleAfterValue": "100007", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of completed demand load requests that= missed the L1, but hit the FB(fill buffer), because a preceding miss to th= e same cacheline initiated the line to be brought into L1, but data is not = yet ready in L1.", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.FB_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready.", + "SampleAfterValue": "100007", + "UMask": "0x40" + }, + { + "BriefDescription": "Retired load instructions with L1 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired load instructions missed L1 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache.", + "SampleAfterValue": "200003", + "UMask": "0x8" + }, + { + "BriefDescription": "Retired load instructions with L2 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources.", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired load instructions missed L2 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources.", + "SampleAfterValue": "100021", + "UMask": "0x10" + }, + { + "BriefDescription": "Retired load instructions with L3 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache.", + "SampleAfterValue": "100021", + "UMask": "0x4" + }, + { + "BriefDescription": "Retired load instructions missed L3 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache.", + "SampleAfterValue": "50021", + "UMask": "0x20" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop hit i= n another cores caches, data forwarding is required as the data is modified= .", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop hit i= n another core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was n= ot needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that hit a cacheline in the L3 where a snoop was s= ent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop hit in another cores caches, data forwarding is re= quired as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop hit in another core, data forwarding is not requir= ed.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that hit a cacheline= in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop hit in another cores caches, data forward= ing is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop hit in another core, data forwarding is n= ot required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent but no other cores had the data.= ", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that hit a = cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was sent but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that hit a cacheline in the L3 where a= snoop was not needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent or n= ot.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop hit in anothe= r cores caches, data forwarding is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop hit in anothe= r core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent but = no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was not neede= d to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that hit a cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop hit in another cores= caches, data forwarding is required as the data is modified.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop hit in another core,= data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent but no othe= r cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was not needed to sa= tisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that hit a cacheline in the L3 where a snoop was sent.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C0020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetches to the L3 only tha= t hit a cacheline in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L3.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C2380", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop hit in= another core, data forwarding is not required.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_HIT_NO_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x4003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was se= nt but no other cores had the data.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x2003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was no= t needed to satisfy the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_NOT_NEEDED", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that hit a cacheline in the L3 where a snoop was se= nt.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_HIT.SNOOP_SENT", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x1E003C8000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that hit a cacheline = in the L3 where a snoop was sent or not.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.L3_HIT.ANY", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FC03C0800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Demand and prefetch data reads", + "EventCode": "0xB0", + "EventName": "OFFCORE_REQUESTS.ALL_DATA_RD", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts memory transactions sent to the uncore= .", + "EventCode": "0xB0", + "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", + "PublicDescription": "Counts memory transactions sent to the uncor= e including requests initiated by the core, all L3 prefetches, reads result= ing from page walks, and snoop responses.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Demand Data Read requests sent to uncore", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Demand RFO requests including regular RFOs, l= ocks, ItoM", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "For every cycle, increments by the number of = outstanding data read requests pending.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= outstanding data read requests pending. Data read requests include cachea= ble demand reads and L2 prefetches, but do not include RFOs, code reads or = prefetches to the L3. Reads due to page walks resulting from any request t= ype will also be counted. Requests are considered outstanding from the tim= e they miss the core's L2 cache until the transaction completion message is= sent to the requestor.", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles where at least 1 outstanding data read= request is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "PublicDescription": "Cycles where at least 1 outstanding data rea= d request is pending. Data read requests include cacheable demand reads an= d L2 prefetches, but do not include RFOs, code reads or prefetches to the L= 3. Reads due to page walks resulting from any request type will also be co= unted. Requests are considered outstanding from the time they miss the cor= e's L2 cache until the transaction completion message is sent to the reques= tor.", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles where at least 1 outstanding Demand RF= O request is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", + "PublicDescription": "Cycles where at least 1 outstanding Demand R= FO request is pending. RFOs are initiated by a core as part of a data sto= re operation. Demand RFO requests include RFOs, locks, and ItoM transactio= ns. Requests are considered outstanding from the time they miss the core's= L2 cache until the transaction completion message is sent to the requestor= .", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "For every cycle, increments by the number of = outstanding demand data read requests pending.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", + "PublicDescription": "For every cycle, increments by the number of= outstanding demand data read requests pending. Requests are considered o= utstanding from the time they miss the core's L2 cache until the transactio= n completion message is sent to the requestor.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Store Read transactions pending for off-core.= Highly correlated.", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", + "PublicDescription": "Counts the number of off-core outstanding re= ad-for-ownership (RFO) store transactions every cycle. An RFO transaction i= s considered to be in the Off-core outstanding state between L2 cache miss = and transaction completion.", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Counts bus locks, accounts for cache line spl= it locks and UC locks.", + "EventCode": "0xF4", + "EventName": "SQ_MISC.BUS_LOCK", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles the queue waiting for offcore response= s is full.", + "EventCode": "0xf4", + "EventName": "SQ_MISC.SQ_FULL", + "PublicDescription": "Counts the cycles for which the thread is ac= tive and the queue waiting for responses from the uncore cannot take any mo= re entries.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of PREFETCHNTA instructions executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.NTA", + "PublicDescription": "Counts the number of PREFETCHNTA instruction= s executed.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of PREFETCHW instructions executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", + "PublicDescription": "Counts the number of PREFETCHW instructions = executed.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of PREFETCHT0 instructions executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.T0", + "PublicDescription": "Counts the number of PREFETCHT0 instructions= executed.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructio= ns executed.", + "EventCode": "0x32", + "EventName": "SW_PREFETCH_ACCESS.T1_T2", + "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT= 2 instructions executed.", + "SampleAfterValue": "100003", + "UMask": "0x4" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json = b/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json new file mode 100644 index 000000000000..85c26c889088 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json @@ -0,0 +1,105 @@ +[ + { + "BriefDescription": "Counts all microcode FP assists.", + "EventCode": "0xc1", + "EventName": "ASSISTS.FP", + "PublicDescription": "Counts all microcode Floating Point assists.= ", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 128-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 2 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice = as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 4 computation operat= ions, one for each element. Applies to SSE* and AVX* packed single precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 4 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perf= orm 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed single = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single and 256-bit packed double precision FP instructions retired; some = instructions will count twice as noted below. Each count represents 2 or/a= nd 4 computation operations, 1 for each element. Applies to SSE* and AVX* = packed single precision and packed double precision FP instructions: ADD SU= B HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DP= P and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x18" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14= RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform = 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14= FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calc= ulations per element. The DAZ and FTZ flags in the MXCSR register need to b= e set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 512-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 16 computatio= n operations, one for each element. Applies to SSE* and AVX* packed single= precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT1= 4 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform= 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 512-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 16 computation oper= ations, one for each element. Applies to SSE* and AVX* packed single preci= sion floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP1= 4 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 cal= culations per element. The DAZ and FTZ flags in the MXCSR register need to = be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of SSE/AVX computational 256-bit packe= d single precision and 512-bit packed double precision FP instructions ret= ired; some instructions will count twice as noted below. Each count repres= ents 8 computation operations, 1 for each element. Applies to SSE* and AVX= * packed single precision and double precision FP instructions: ADD SUB HAD= D HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB= . DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per elem= ent.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision and 512-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 8 computation operations, one for each element. Applies = to SSE* and AVX* packed single precision and double precision floating-poin= t instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14= RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice= as they perform 2 calculations per element. The DAZ and FTZ flags in the M= XCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x60" + }, + { + "BriefDescription": "Number of SSE/AVX computational scalar floati= ng-point instructions retired; some instructions will count twice as noted = below. Applies to SSE* and AVX* scalar, double and single precision floati= ng-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP= and FM(N)ADD/SUB instructions count twice as they perform multiple calcula= tions per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", + "SampleAfterValue": "1000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= double precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar double precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructi= ons count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= single precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar single precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB= instructions count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of any Vector retired FP arithmetic in= structions", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.VECTOR", + "SampleAfterValue": "1000003", + "UMask": "0xfc" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/frontend.json b/tool= s/perf/pmu-events/arch/x86/rocketlake/frontend.json new file mode 100644 index 000000000000..2b539a08d2bf --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/frontend.json @@ -0,0 +1,377 @@ +[ + { + "BriefDescription": "Counts the total number when the front end is= resteered, mainly when the BPU cannot provide a correct prediction and thi= s is corrected by other branch handling mechanisms at the front end.", + "EventCode": "0xe6", + "EventName": "BACLEARS.ANY", + "PublicDescription": "Counts the number of times the front-end is = resteered when it finds a branch instruction in a fetch line. This occurs f= or the first time a branch instruction is fetched or when the branch is not= tracked by the BPU (Branch Prediction Unit) anymore.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transition= s count.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xab", + "EventName": "DSB2MITE_SWITCHES.COUNT", + "PublicDescription": "Counts the number of Decode Stream Buffer (D= SB a.k.a. Uop Cache)-to-MITE speculative transitions.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "DSB-to-MITE switch true penalty cycles.", + "EventCode": "0xab", + "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retired Instructions who experienced DSB miss= .", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x1", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced a critic= al DSB miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.DSB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x11", + "PEBS": "1", + "PublicDescription": "Number of retired Instructions that experien= ced a critical DSB (Decode stream buffer i.e. the decoded instruction-cache= ) miss. Critical means stalls were exposed to the back-end as a result of t= he DSB miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced iTLB tru= e miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ITLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x14", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L1 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L1I_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x12", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L2 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L2_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x13", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L2 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 1 cycle", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x500106", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 1 cycle which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 128 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", + "MSRIndex": "0x3F7", + "MSRValue": "0x508006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 16 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", + "MSRIndex": "0x3F7", + "MSRValue": "0x501006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 2 cycles", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", + "MSRIndex": "0x3F7", + "MSRValue": "0x500206", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 256 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", + "MSRIndex": "0x3F7", + "MSRValue": "0x510006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end had at least 1 bubble-slot for a period of 2= cycles which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x100206", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 32 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", + "MSRIndex": "0x3F7", + "MSRValue": "0x502006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 4 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", + "MSRIndex": "0x3F7", + "MSRValue": "0x500406", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 512 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", + "MSRIndex": "0x3F7", + "MSRValue": "0x520006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 64 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", + "MSRIndex": "0x3F7", + "MSRValue": "0x504006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 8 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", + "MSRIndex": "0x3F7", + "MSRValue": "0x500806", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Retired Instructions who experienced STLB (2n= d level TLB) true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.STLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x15", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= STLB (2nd level TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", + "EventCode": "0x80", + "EventName": "ICACHE_16B.IFDATA_STALL", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Instruction fetch tag lookups that hit in the= instruction cache (L1I). Counts at 64-byte cache-line granularity.", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_HIT", + "PublicDescription": "Counts instruction fetch tag lookups that hi= t in the instruction cache (L1I). Counts at 64-byte cache-line granularity.= Accounts for both cacheable and uncacheable accesses.", + "SampleAfterValue": "200003", + "UMask": "0x1" + }, + { + "BriefDescription": "Instruction fetch tag lookups that miss in th= e instruction cache (L1I). Counts at 64-byte cache-line granularity.", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_MISS", + "PublicDescription": "Counts instruction fetch tag lookups that mi= ss in the instruction cache (L1I). Counts at 64-byte cache-line granularity= . Accounts for both cacheable and uncacheable accesses.", + "SampleAfterValue": "200003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", + "EventCode": "0x83", + "EventName": "ICACHE_64B.IFTAG_STALL", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.ST= ALLS]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IF= TAG_STALL]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles DSB is delivering optimal number of Uo= ps", + "CounterMask": "5", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", + "EventCode": "0x79", + "EventName": "IDQ.DSB_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles MITE is delivering any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles MITE is delivering optimal number of U= ops", + "CounterMask": "5", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from MITE path", + "EventCode": "0x79", + "EventName": "IDQ.MITE_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles when uops are being delivered to IDQ w= hile MS is busy", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_CYCLES_ANY", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", + "SampleAfterValue": "2000003", + "UMask": "0x30" + }, + { + "BriefDescription": "Number of switches from DSB or MITE to the MS= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_SWITCHES", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", + "SampleAfterValue": "100003", + "UMask": "0x30" + }, + { + "BriefDescription": "Uops delivered to IDQ while MS is busy", + "EventCode": "0x79", + "EventName": "IDQ.MS_UOPS", + "PublicDescription": "Counts the total number of uops delivered by= the Microcode Sequencer (MS). Any instruction over 4 uops will be delivere= d by the MS. Some instructions such as transcendentals may additionally gen= erate uops from the MS.", + "SampleAfterValue": "100003", + "UMask": "0x30" + }, + { + "BriefDescription": "Uops not delivered by IDQ when backend of the= machine is not stalled", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when no uops are not delivered by the = IDQ when backend of the machine is not stalled", + "CounterMask": "5", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when optimal number of uops was delive= red to the back-end when the back-end is not stalled", + "CounterMask": "1", + "EventCode": "0x9C", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", + "Invert": "1", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/memory.json b/tools/= perf/pmu-events/arch/x86/rocketlake/memory.json new file mode 100644 index 000000000000..e8d2ec1c029b --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/memory.json @@ -0,0 +1,394 @@ +[ + { + "BriefDescription": "Cycles while L3 cache miss demand load is out= standing.", + "CounterMask": "2", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Execution stalls while L3 cache miss demand l= oad is outstanding.", + "CounterMask": "6", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x6" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to any reasons (multiple categories may count as one).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED", + "PublicDescription": "Counts the number of times HLE abort was tri= ggered.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to unfriendly events (such as interrupts).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_EVENTS", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to unfriendly events (such as interrupts).", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to various memory events (e.g., read/write capacity and conflicts).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_MEM", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to various memory events (e.g., read/write capacity and conflic= ts).", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an HLE execution aborted due = to HLE-unfriendly instructions and certain unfriendly events (such as AD as= sists etc.).", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.ABORTED_UNFRIENDLY", + "PublicDescription": "Counts the number of times an HLE execution = aborted due to HLE-unfriendly instructions and certain unfriendly events (s= uch as AD assists etc.).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times an HLE execution successfully= committed", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.COMMIT", + "PublicDescription": "Counts the number of times HLE commit succee= ded.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an HLE execution started.", + "EventCode": "0xc8", + "EventName": "HLE_RETIRED.START", + "PublicDescription": "Counts the number of times we entered an HLE= region. Does not count nested transactions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of machine clears due to memory orderi= ng conflicts.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", + "PublicDescription": "Counts the number of Machine Clears detected= dye to memory ordering. Memory Ordering Machine Clears may apply when a me= mory read may not conform to the memory ordering rules of the x86 architect= ure", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 128 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", + "MSRIndex": "0x3F6", + "MSRValue": "0x80", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 128 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "1009", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 16 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", + "MSRIndex": "0x3F6", + "MSRValue": "0x10", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 16 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "20011", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 256 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", + "MSRIndex": "0x3F6", + "MSRValue": "0x100", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 256 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "503", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 32 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", + "MSRIndex": "0x3F6", + "MSRValue": "0x20", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 32 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 4 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", + "MSRIndex": "0x3F6", + "MSRValue": "0x4", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 4 cycles. Reported l= atency may be longer than just the memory latency.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 512 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", + "MSRIndex": "0x3F6", + "MSRValue": "0x200", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 512 cycles. Reported= latency may be longer than just the memory latency.", + "SampleAfterValue": "101", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 64 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", + "MSRIndex": "0x3F6", + "MSRValue": "0x40", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 64 cycles. Reported = latency may be longer than just the memory latency.", + "SampleAfterValue": "2003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 8 cycles.", + "Data_LA": "1", + "EventCode": "0xcd", + "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", + "MSRIndex": "0x3F6", + "MSRValue": "0x8", + "PEBS": "2", + "PublicDescription": "Counts randomly selected loads when the late= ncy from first dispatch to completion is greater than 8 cycles. Reported l= atency may be longer than just the memory latency.", + "SampleAfterValue": "50021", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that was not supplie= d by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that was no= t supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that was not supplied by the L3 cache.= ", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that was not supplied by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC08000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that was not supplied= by the L3 cache.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.L3_MISS", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x3FFFC00800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data read requests that miss th= e L3 cache.", + "EventCode": "0xb0", + "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles where at least one demand data read re= quest known to have missed the L3 cache is pending.", + "CounterMask": "1", + "EventCode": "0x60", + "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEM= AND_DATA_RD", + "PublicDescription": "Cycles where at least one demand data read r= equest known to have missed the L3 cache is pending. Note that this does n= ot capture all elapsed cycles while requests are outstanding - only cycles = from when the requests were known to have missed the L3 cache.", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of times an RTM execution aborted.", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED", + "PublicDescription": "Counts the number of times RTM abort was tri= ggered.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to none of the previous 4 categories (e.g. interrupt)", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_EVENTS", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to none of the previous 4 categories (e.g. interrupt).", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to various memory events (e.g. read/write capacity and conflicts)", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_MEM", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to various memory events (e.g. read/write capacity and conflict= s).", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to incompatible memory type", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to incompatible memory type.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Number of times an RTM execution aborted due = to HLE-unfriendly instructions", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", + "PublicDescription": "Counts the number of times an RTM execution = aborted due to HLE-unfriendly instructions.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times an RTM execution successfully= committed", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.COMMIT", + "PublicDescription": "Counts the number of times RTM commit succee= ded.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an RTM execution started.", + "EventCode": "0xc9", + "EventName": "RTM_RETIRED.START", + "PublicDescription": "Counts the number of times we entered an RTM= region. Does not count nested transactions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of times a class of instruc= tions that may cause a transactional abort was executed inside a transactio= nal region", + "EventCode": "0x5d", + "EventName": "TX_EXEC.MISC2", + "PublicDescription": "Counts Unfriendly TSX abort triggered by a v= zeroupper instruction.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times an instruction execution caus= ed the transactional nest count supported to be exceeded", + "EventCode": "0x5d", + "EventName": "TX_EXEC.MISC3", + "PublicDescription": "Counts Unfriendly TSX abort triggered by a n= est count that is too deep.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Speculatively counts the number of TSX aborts= due to a data capacity limitation for transactional reads", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CAPACITY_READ", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional reads", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Speculatively counts the number of TSX aborts= due to a data capacity limitation for transactional writes.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", + "PublicDescription": "Speculatively counts the number of Transacti= onal Synchronization Extensions (TSX) aborts due to a data capacity limitat= ion for transactional writes.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of times a transactional abort was sig= naled due to a data conflict on a transactionally accessed address", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_CONFLICT", + "PublicDescription": "Counts the number of times a TSX line had a = cache conflict.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to XRELEASE lock not satisfying the address and value require= ments in the elision buffer", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to release/commit but data and address mismatch.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to NoAllocatedElisionBuffer being non-zero.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to commit but Lock Buffer not empty.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Number of times an HLE transactional executio= n aborted due to an unsupported read alignment from the elision buffer.", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMEN= T", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to attempting an unsupported alignment from Lock Buffer.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of times a HLE transactional region ab= orted due to a non XRELEASE prefixed instruction writing to an elided lock = in the elision buffer", + "EventCode": "0x54", + "EventName": "TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK", + "PublicDescription": "Counts the number of times a TSX Abort was t= riggered due to a non-release/commit store to lock.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of times HLE lock could not be elided = due to ElisionBufferAvailable being zero.", + "EventCode": "0x54", + "EventName": "TX_MEM.HLE_ELISION_BUFFER_FULL", + "PublicDescription": "Counts the number of times we could not allo= cate Lock Buffer.", + "SampleAfterValue": "100003", + "UMask": "0x40" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json b/= tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json new file mode 100644 index 000000000000..a151ba9cccb0 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json @@ -0,0 +1,113 @@ +{ + "Backend": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Bad": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "BadSpec": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "BigFoot": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "BrMispredicts": "Grouping from Top-down Microarchitecture Analysis Me= trics spreadsheet", + "Branches": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "CacheMisses": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "CodeGen": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Compute": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "Cor": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "DSB": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "DSBmiss": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "DataSharing": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "Fed": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "FetchBW": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "FetchLat": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "Flops": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "FpScalar": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "FpVector": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "Frontend": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "HPC": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "IcMiss": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "InsType": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "L2Evicts": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "LSD": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "MachineClears": "Grouping from Top-down Microarchitecture Analysis Me= trics spreadsheet", + "Mem": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "MemoryBW": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "MemoryBound": "Grouping from Top-down Microarchitecture Analysis Metr= ics spreadsheet", + "MemoryLat": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "MemoryTLB": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Memory_BW": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Memory_Lat": "Grouping from Top-down Microarchitecture Analysis Metri= cs spreadsheet", + "MicroSeq": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "OS": "Grouping from Top-down Microarchitecture Analysis Metrics sprea= dsheet", + "Offcore": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "PGO": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Pipeline": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "PortsUtil": "Grouping from Top-down Microarchitecture Analysis Metric= s spreadsheet", + "Power": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "Prefetches": "Grouping from Top-down Microarchitecture Analysis Metri= cs spreadsheet", + "Ret": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Retire": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "SMT": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Server": "Grouping from Top-down Microarchitecture Analysis Metrics s= preadsheet", + "Snoop": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "SoC": "Grouping from Top-down Microarchitecture Analysis Metrics spre= adsheet", + "Summary": "Grouping from Top-down Microarchitecture Analysis Metrics = spreadsheet", + "TmaL1": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "TmaL2": "Grouping from Top-down Microarchitecture Analysis Metrics sp= readsheet", + "TmaL3mem": "Grouping from Top-down Microarchitecture Analysis Metrics= spreadsheet", + "TopdownL1": "Metrics for top-down breakdown at level 1", + "TopdownL2": "Metrics for top-down breakdown at level 2", + "TopdownL3": "Metrics for top-down breakdown at level 3", + "TopdownL4": "Metrics for top-down breakdown at level 4", + "TopdownL5": "Metrics for top-down breakdown at level 5", + "TopdownL6": "Metrics for top-down breakdown at level 6", + "tma_L1_group": "Metrics for top-down breakdown at level 1", + "tma_L2_group": "Metrics for top-down breakdown at level 2", + "tma_L3_group": "Metrics for top-down breakdown at level 3", + "tma_L4_group": "Metrics for top-down breakdown at level 4", + "tma_L5_group": "Metrics for top-down breakdown at level 5", + "tma_L6_group": "Metrics for top-down breakdown at level 6", + "tma_alu_op_utilization_group": "Metrics contributing to tma_alu_op_ut= ilization category", + "tma_backend_bound_group": "Metrics contributing to tma_backend_bound = category", + "tma_bad_speculation_group": "Metrics contributing to tma_bad_speculat= ion category", + "tma_branch_resteers_group": "Metrics contributing to tma_branch_reste= ers category", + "tma_core_bound_group": "Metrics contributing to tma_core_bound catego= ry", + "tma_dram_bound_group": "Metrics contributing to tma_dram_bound catego= ry", + "tma_dtlb_load_group": "Metrics contributing to tma_dtlb_load category= ", + "tma_dtlb_store_group": "Metrics contributing to tma_dtlb_store catego= ry", + "tma_fetch_bandwidth_group": "Metrics contributing to tma_fetch_bandwi= dth category", + "tma_fetch_latency_group": "Metrics contributing to tma_fetch_latency = category", + "tma_fp_arith_group": "Metrics contributing to tma_fp_arith category", + "tma_fp_vector_group": "Metrics contributing to tma_fp_vector category= ", + "tma_frontend_bound_group": "Metrics contributing to tma_frontend_boun= d category", + "tma_heavy_operations_group": "Metrics contributing to tma_heavy_opera= tions category", + "tma_issue2P": "Metrics related by the issue $issue2P", + "tma_issueBC": "Metrics related by the issue $issueBC", + "tma_issueBM": "Metrics related by the issue $issueBM", + "tma_issueBW": "Metrics related by the issue $issueBW", + "tma_issueD0": "Metrics related by the issue $issueD0", + "tma_issueFB": "Metrics related by the issue $issueFB", + "tma_issueFL": "Metrics related by the issue $issueFL", + "tma_issueL1": "Metrics related by the issue $issueL1", + "tma_issueLat": "Metrics related by the issue $issueLat", + "tma_issueMC": "Metrics related by the issue $issueMC", + "tma_issueMS": "Metrics related by the issue $issueMS", + "tma_issueMV": "Metrics related by the issue $issueMV", + "tma_issueRFO": "Metrics related by the issue $issueRFO", + "tma_issueSL": "Metrics related by the issue $issueSL", + "tma_issueSO": "Metrics related by the issue $issueSO", + "tma_issueSmSt": "Metrics related by the issue $issueSmSt", + "tma_issueSpSt": "Metrics related by the issue $issueSpSt", + "tma_issueSyncxn": "Metrics related by the issue $issueSyncxn", + "tma_issueTLB": "Metrics related by the issue $issueTLB", + "tma_l1_bound_group": "Metrics contributing to tma_l1_bound category", + "tma_l3_bound_group": "Metrics contributing to tma_l3_bound category", + "tma_light_operations_group": "Metrics contributing to tma_light_opera= tions category", + "tma_load_op_utilization_group": "Metrics contributing to tma_load_op_= utilization category", + "tma_mem_latency_group": "Metrics contributing to tma_mem_latency cate= gory", + "tma_memory_bound_group": "Metrics contributing to tma_memory_bound ca= tegory", + "tma_microcode_sequencer_group": "Metrics contributing to tma_microcod= e_sequencer category", + "tma_mite_group": "Metrics contributing to tma_mite category", + "tma_ports_utilization_group": "Metrics contributing to tma_ports_util= ization category", + "tma_ports_utilized_0_group": "Metrics contributing to tma_ports_utili= zed_0 category", + "tma_ports_utilized_3m_group": "Metrics contributing to tma_ports_util= ized_3m category", + "tma_retiring_group": "Metrics contributing to tma_retiring category", + "tma_serializing_operation_group": "Metrics contributing to tma_serial= izing_operation category", + "tma_store_bound_group": "Metrics contributing to tma_store_bound cate= gory", + "tma_store_op_utilization_group": "Metrics contributing to tma_store_o= p_utilization category" +} diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/other.json b/tools/p= erf/pmu-events/arch/x86/rocketlake/other.json new file mode 100644 index 000000000000..cfb590632918 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/other.json @@ -0,0 +1,242 @@ +[ + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the Non-AVX turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL0_TURBO_LICENSE", + "PublicDescription": "Counts Core cycles where the core was runnin= g with power-delivery for baseline license level 0. This includes non-AVX = codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes.", + "SampleAfterValue": "200003", + "UMask": "0x7" + }, + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the AVX2 turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL1_TURBO_LICENSE", + "PublicDescription": "Counts Core cycles where the core was runnin= g with power-delivery for license level 1. This includes high current AVX = 256-bit instructions as well as low current AVX 512-bit instructions.", + "SampleAfterValue": "200003", + "UMask": "0x18" + }, + { + "BriefDescription": "Core cycles where the core was running in a m= anner where Turbo may be clipped to the AVX512 turbo schedule.", + "EventCode": "0x28", + "EventName": "CORE_POWER.LVL2_TURBO_LICENSE", + "PublicDescription": "Core cycles where the core was running with = power-delivery for license level 2 (introduced in Skylake Server microarcht= ecture). This includes high current AVX 512-bit instructions.", + "SampleAfterValue": "200003", + "UMask": "0x20" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand instruction fetches and L1 inst= ruction cache prefetches that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_CODE_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000004", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that have any type o= f response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that DRAM supplied t= he request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand data reads that DRAM supplied t= he request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_DATA_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000001", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that have a= ny type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that DRAM s= upplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts demand reads for ownership (RFO) reque= sts and software prefetches for exclusive ownership (PREFETCHW) that DRAM s= upplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.DEMAND_RFO.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000002", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts L1 data cache prefetch requests and so= ftware prefetches (except PREFETCHW) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L1D_AND_SWPF.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000400", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch data reads (which br= ing data to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_DATA_RD.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000010", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts hardware prefetch RFOs (which bring da= ta to L2) that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.HWPF_L2_RFO.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000020", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that have any type of response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x18000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184008000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts miscellaneous requests, such as I/O an= d un-cacheable accesses that DRAM supplied the request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.OTHER.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184008000", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that have any type of= response.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.ANY_RESPONSE", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that DRAM supplied th= e request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000800", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts streaming stores that DRAM supplied th= e request.", + "EventCode": "0xB7, 0xBB", + "EventName": "OCR.STREAMING_WR.LOCAL_DRAM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x184000800", + "SampleAfterValue": "100003", + "UMask": "0x1" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/rocketlake/pipeline.json new file mode 100644 index 000000000000..375b78044f14 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json @@ -0,0 +1,801 @@ +[ + { + "BriefDescription": "Cycles when divide unit is busy executing div= ide or square root operations.", + "CounterMask": "1", + "EventCode": "0x14", + "EventName": "ARITH.DIVIDER_ACTIVE", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", + "SampleAfterValue": "1000003", + "UMask": "0x9" + }, + { + "BriefDescription": "Number of occurrences where a microcode assis= t is invoked by hardware.", + "EventCode": "0xc1", + "EventName": "ASSISTS.ANY", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware Examples include AD (page Access Dirty= ), FP and AVX related assists.", + "SampleAfterValue": "100003", + "UMask": "0x7" + }, + { + "BriefDescription": "All branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.ALL_BRANCHES", + "PEBS": "1", + "PublicDescription": "Counts all branch instructions retired.", + "SampleAfterValue": "400009" + }, + { + "BriefDescription": "Conditional branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts conditional branch instructions retir= ed.", + "SampleAfterValue": "400009", + "UMask": "0x11" + }, + { + "BriefDescription": "Not taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts not taken branch instructions retired= .", + "SampleAfterValue": "400009", + "UMask": "0x10" + }, + { + "BriefDescription": "Taken conditional branch instructions retired= .", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional branch instructions= retired.", + "SampleAfterValue": "400009", + "UMask": "0x1" + }, + { + "BriefDescription": "Far branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.FAR_BRANCH", + "PEBS": "1", + "PublicDescription": "Counts far branch instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x40" + }, + { + "BriefDescription": "Indirect near branch instructions retired (ex= cluding returns)", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch.", + "SampleAfterValue": "100003", + "UMask": "0x80" + }, + { + "BriefDescription": "Direct and indirect near call instructions re= tired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_CALL", + "PEBS": "1", + "PublicDescription": "Counts both direct and indirect near call in= structions retired.", + "SampleAfterValue": "100007", + "UMask": "0x2" + }, + { + "BriefDescription": "Return instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_RETURN", + "PEBS": "1", + "PublicDescription": "Counts return instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x8" + }, + { + "BriefDescription": "Taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken branch instructions retired.", + "SampleAfterValue": "400009", + "UMask": "0x20" + }, + { + "BriefDescription": "All mispredicted branch instructions retired.= ", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", + "PEBS": "1", + "PublicDescription": "Counts all the retired branch instructions t= hat were mispredicted by the processor. A branch misprediction occurs when = the processor incorrectly predicts the destination of the branch. When the= misprediction is discovered at execution, all the instructions executed in= the wrong (speculative) path must be discarded, and the processor must sta= rt fetching from the correct path.", + "SampleAfterValue": "50021" + }, + { + "BriefDescription": "Mispredicted conditional branch instructions = retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired.", + "SampleAfterValue": "50021", + "UMask": "0x11" + }, + { + "BriefDescription": "Mispredicted non-taken conditional branch ins= tructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken.", + "SampleAfterValue": "50021", + "UMask": "0x10" + }, + { + "BriefDescription": "number of branch instructions retired that we= re mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired.", + "SampleAfterValue": "50021", + "UMask": "0x1" + }, + { + "BriefDescription": "All miss-predicted indirect branch instructio= ns retired (excluding RETs. TSX aborts is considered indirect branch).", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts all miss-predicted indirect branch in= structions retired (excluding RETs. TSX aborts is considered indirect branc= h).", + "SampleAfterValue": "50021", + "UMask": "0x80" + }, + { + "BriefDescription": "Mispredicted indirect CALL instructions retir= ed.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", + "PEBS": "1", + "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect.", + "SampleAfterValue": "50021", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of near branch instructions retired th= at were mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken.", + "SampleAfterValue": "50021", + "UMask": "0x20" + }, + { + "BriefDescription": "This event counts the number of mispredicted = ret instructions retired. Non PEBS", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.RET", + "PEBS": "1", + "PublicDescription": "This is a non-precise version (that is, does= not use PEBS) of the event that counts mispredicted return instructions re= tired.", + "SampleAfterValue": "50021", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycle counts are evenly distributed between a= ctive threads in the Core.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Core crystal clock cycles when this thread is= unhalted and the other thread is halted.", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", + "SampleAfterValue": "25003", + "UMask": "0x2" + }, + { + "BriefDescription": "Core crystal clock cycles. Cycle counts are e= venly distributed between active threads in the Core.", + "EventCode": "0x3c", + "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Reference cycles when the core is not in halt= state.", + "EventName": "CPU_CLK_UNHALTED.REF_TSC", + "PublicDescription": "Counts the number of reference cycles when t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction or the MWAIT instruction. This event is not affe= cted by core frequency changes (for example, P states, TM2 transitions) but= has the same incrementing frequency as the time stamp counter. This event = can approximate elapsed time while the core was not in a halt state. This e= vent has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is c= ounted on a dedicated fixed counter, leaving the eight programmable counter= s available for other events. Note: On all current platforms this event sto= ps counting during 'throttling (TM)' states duty off periods the processor = is 'halted'. The counter update is done at a lower clock rate then the cor= e clock the overflow status bit for this counter may appear 'sticky'. Afte= r the counter has overflowed and software clears the overflow status bit an= d resets the counter to less than MAX. The reset value to the counter is no= t clocked immediately so the overflow status bit will flip 'high (1)' and g= enerate another PMI (if enabled) after which the reset value gets clocked i= nto the counter. Therefore, software will get the interrupt, read the overf= low status bit '1 for bit 34 while the counter value is less than MAX. Soft= ware should ignore this case.", + "SampleAfterValue": "2000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Core crystal clock cycles when the thread is = unhalted.", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.REF_XCLK", + "PublicDescription": "Counts core crystal clock cycles when the th= read is unhalted.", + "SampleAfterValue": "25003", + "UMask": "0x1" + }, + { + "BriefDescription": "Core cycles when the thread is not in halt st= ate", + "EventName": "CPU_CLK_UNHALTED.THREAD", + "PublicDescription": "Counts the number of core cycles while the t= hread is not in a halt state. The thread enters the halt state when it is r= unning the HLT instruction. This event is a component in many key event rat= ios. The core frequency may change from time to time due to transitions ass= ociated with Enhanced Intel SpeedStep Technology or TM2. For this reason th= is event may have a changing ratio with regards to time. When the core freq= uency is constant, this event can approximate elapsed time while the core w= as not in the halt state. It is counted on a dedicated fixed counter, leavi= ng the eight programmable counters available for other events.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Thread cycles when thread is not in halt stat= e", + "EventCode": "0x3C", + "EventName": "CPU_CLK_UNHALTED.THREAD_P", + "PublicDescription": "This is an architectural event that counts t= he number of thread cycles while the thread is not in a halt state. The thr= ead enters the halt state when it is running the HLT instruction. The core = frequency may change from time to time due to power or thermal throttling. = For this reason, this event may have a changing ratio with regards to wall = clock time.", + "SampleAfterValue": "2000003" + }, + { + "BriefDescription": "Cycles while L1 cache miss demand load is out= standing.", + "CounterMask": "8", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles while L2 cache miss demand load is out= standing.", + "CounterMask": "1", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles while memory subsystem has an outstand= ing load.", + "CounterMask": "16", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Execution stalls while L1 cache miss demand l= oad is outstanding.", + "CounterMask": "12", + "EventCode": "0xA3", + "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0xc" + }, + { + "BriefDescription": "Execution stalls while L2 cache miss demand l= oad is outstanding.", + "CounterMask": "5", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x5" + }, + { + "BriefDescription": "Execution stalls while memory subsystem has a= n outstanding load.", + "CounterMask": "20", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_MEM_ANY", + "SampleAfterValue": "1000003", + "UMask": "0x14" + }, + { + "BriefDescription": "Total execution stalls.", + "CounterMask": "4", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", + "SampleAfterValue": "1000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles total of 1 uop is executed on all port= s and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles total of 2 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles total of 3 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x8" + }, + { + "BriefDescription": "Cycles total of 4 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Cycles where the Store Buffer was full and no= loads caused an execution stall.", + "CounterMask": "2", + "EventCode": "0xA6", + "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", + "SampleAfterValue": "1000003", + "UMask": "0x40" + }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", + "EventCode": "0x87", + "EventName": "ILD_STALL.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Instruction decoders utilized in a cycle", + "EventCode": "0x55", + "EventName": "INST_DECODED.DECODERS", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventName": "INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts the number of instructions retired - = an Architectural PerfMon event. Counting continues during hardware interrup= ts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counte= d by a designated fixed counter freeing up programmable counters to count o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of instructions retired. General Count= er - architectural event", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.ANY_P", + "PEBS": "1", + "PublicDescription": "Counts the number of instructions retired - = an Architectural PerfMon event. Counting continues during hardware interrup= ts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counte= d by a designated fixed counter freeing up programmable counters to count o= ther events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003" + }, + { + "BriefDescription": "Number of all retired NOP instructions.", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.NOP", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Precise instruction retired event with a redu= ced effect of PEBS shadow in IP distribution", + "EventName": "INST_RETIRED.PREC_DIST", + "PEBS": "1", + "PublicDescription": "A version of INST_RETIRED that allows for a = more unbiased distribution of samples across instructions retired. It utili= zes the Precise Distribution of Instructions Retired (PDIR) feature to miti= gate some bias in how retired instructions get sampled. Use on Fixed Counte= r 0.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles without actually retired instructions.= ", + "CounterMask": "1", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "This event counts cycles without actually re= tired instructions.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles the Backend cluster is recovering afte= r a miss-speculation or a Store Buffer or Load Buffer drain stall.", + "CounterMask": "1", + "EventCode": "0x0D", + "EventName": "INT_MISC.ALL_RECOVERY_CYCLES", + "PublicDescription": "Counts cycles the Backend cluster is recover= ing after a miss-speculation or a Store Buffer or Load Buffer drain stall.", + "SampleAfterValue": "2000003", + "UMask": "0x3" + }, + { + "BriefDescription": "Clears speculative count", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x0D", + "EventName": "INT_MISC.CLEARS_COUNT", + "PublicDescription": "Counts the number of speculative clears due = to any type of branch misprediction or machine clears", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts cycles after recovery from a branch mi= sprediction or machine clear till the first uop is issued from the resteere= d path.", + "EventCode": "0x0d", + "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", + "SampleAfterValue": "500009", + "UMask": "0x80" + }, + { + "BriefDescription": "Core cycles the allocator was stalled due to = recovery from earlier clear event for this thread", + "EventCode": "0x0D", + "EventName": "INT_MISC.RECOVERY_CYCLES", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, + { + "BriefDescription": "TMA slots where uops got dropped", + "EventCode": "0x0d", + "EventName": "INT_MISC.UOP_DROPPING", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", + "SampleAfterValue": "1000003", + "UMask": "0x10" + }, + { + "BriefDescription": "The number of times that split load operation= s are temporarily blocked because all resources for handling the split acce= sses are in use.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.NO_SR", + "PublicDescription": "Counts the number of times that split load o= perations are temporarily blocked because all resources for handling the sp= lit accesses are in use.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Loads blocked due to overlapping with a prece= ding store that cannot be forwarded.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_FORWARD", + "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "False dependencies due to partial compare on = address.", + "EventCode": "0x07", + "EventName": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS", + "PublicDescription": "Counts the number of times a load got blocke= d due to false dependencies due to partial compare on address.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of demand load dispatches t= hat hit L1D fill buffer (FB) allocated for software prefetch.", + "EventCode": "0x4c", + "EventName": "LOAD_HIT_PREFETCH.SWPF", + "PublicDescription": "Counts all not software-prefetch load dispat= ches that hit the fill buffer (FB) allocated for the software prefetch. It = can also be incremented by some lock instructions. So it should only be use= d with profiling so that the locks can be excluded by ASM (Assembly File) i= nspection of the nearby instructions.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles Uops delivered by the LSD, but didn't = come from the decoder.", + "CounterMask": "1", + "EventCode": "0xA8", + "EventName": "LSD.CYCLES_ACTIVE", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles optimal number of Uops delivered by th= e LSD, but did not come from the decoder.", + "CounterMask": "5", + "EventCode": "0xa8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of Uops delivered by the LSD.", + "EventCode": "0xa8", + "EventName": "LSD.UOPS", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of machine clears (nukes) of any type.= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.COUNT", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "Self-modifying code (SMC) detected.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.SMC", + "PublicDescription": "Counts self-modifying code (SMC) detected, w= hich causes a machine clear.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Increments whenever there is an update to the= LBR array.", + "EventCode": "0xcc", + "EventName": "MISC_RETIRED.LBR_INSERTS", + "PublicDescription": "Increments when an entry is added to the Las= t Branch Record (LBR) array (or removed from the array in case of RETURNs i= n call stack mode). The event requires LBR to be enabled properly.", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of retired PAUSE instructions. This ev= ent is not supported on first SKL and KBL products.", + "EventCode": "0xcc", + "EventName": "MISC_RETIRED.PAUSE_INST", + "PublicDescription": "Counts number of retired PAUSE instructions.= This event is not supported on first SKL and KBL products.", + "SampleAfterValue": "100003", + "UMask": "0x40" + }, + { + "BriefDescription": "Cycles stalled due to no store buffers availa= ble. (not including draining form sync).", + "EventCode": "0xa2", + "EventName": "RESOURCE_STALLS.SB", + "PublicDescription": "Counts allocation stall cycles caused by the= store buffer (SB) being full. This counts cycles that the pipeline back-en= d blocked uop delivery from the front-end.", + "SampleAfterValue": "100003", + "UMask": "0x8" + }, + { + "BriefDescription": "Counts cycles where the pipeline is stalled d= ue to serializing operations.", + "EventCode": "0xa2", + "EventName": "RESOURCE_STALLS.SCOREBOARD", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles when Reservation Station (RS) is empty= for the thread", + "EventCode": "0x5e", + "EventName": "RS_EVENTS.EMPTY_CYCLES", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into stravation periods (e.g. branch mispredi= ctions or i-cache misses)", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts end of periods where the Reservation S= tation (RS) was empty.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x5E", + "EventName": "RS_EVENTS.EMPTY_END", + "Invert": "1", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", + "SampleAfterValue": "100003", + "UMask": "0x1" + }, + { + "BriefDescription": "TMA slots where no uops were being issued due= to lack of back-end resources.", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", + "PublicDescription": "Counts the number of Top-down Microarchitect= ure Analysis (TMA) method's slots where no micro-operations were being iss= ued from front-end to back-end of the machine due to lack of back-end resou= rces.", + "SampleAfterValue": "10000003", + "UMask": "0x2" + }, + { + "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the speculative path a= s well as the out-of-order engine recovery past a branch misprediction.", + "SampleAfterValue": "10000003", + "UMask": "0x8" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventName": "TOPDOWN.SLOTS", + "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", + "SampleAfterValue": "10000003", + "UMask": "0x4" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. General counter - architectural event", + "EventCode": "0xa4", + "EventName": "TOPDOWN.SLOTS_P", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", + "SampleAfterValue": "10000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops decoded out of instructions ex= clusively fetched by decoder 0", + "EventCode": "0x56", + "EventName": "UOPS_DECODED.DEC0", + "PublicDescription": "Uops exclusively fetched by decoder 0", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops executed on port 0", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_0", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 0.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Number of uops executed on port 1", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_1", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 1.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of uops executed on port 2 and 3", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_2_3", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 2 and 3.", + "SampleAfterValue": "2000003", + "UMask": "0x4" + }, + { + "BriefDescription": "Number of uops executed on port 4 and 9", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_4_9", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 5 and 9.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Number of uops executed on port 5", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_5", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 5.", + "SampleAfterValue": "2000003", + "UMask": "0x20" + }, + { + "BriefDescription": "Number of uops executed on port 6", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_6", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o port 6.", + "SampleAfterValue": "2000003", + "UMask": "0x40" + }, + { + "BriefDescription": "Number of uops executed on port 7 and 8", + "EventCode": "0xa1", + "EventName": "UOPS_DISPATCHED.PORT_7_8", + "PublicDescription": "Counts, on the per-thread basis, cycles duri= ng which at least one uop is dispatched from the Reservation Station (RS) t= o ports 7 and 8.", + "SampleAfterValue": "2000003", + "UMask": "0x80" + }, + { + "BriefDescription": "Number of uops executed on the core.", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE", + "PublicDescription": "Counts the number of uops executed from any = thread.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 1 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "1", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 2 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "2", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 3 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "3", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles at least 4 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "4", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles where at least 1 uop was executed per-= thread", + "CounterMask": "1", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_1", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 2 uops were executed pe= r-thread", + "CounterMask": "2", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_2", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 3 uops were executed pe= r-thread", + "CounterMask": "3", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_3", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles where at least 4 uops were executed pe= r-thread", + "CounterMask": "4", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_4", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts number of cycles no uops were dispatch= ed to be executed on this thread.", + "CounterMask": "1", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of uops to be executed per-= thread each cycle.", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.THREAD", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Counts the number of x87 uops dispatched.", + "EventCode": "0xB1", + "EventName": "UOPS_EXECUTED.X87", + "PublicDescription": "Counts the number of x87 uops executed.", + "SampleAfterValue": "2000003", + "UMask": "0x10" + }, + { + "BriefDescription": "Uops that RAT issues to RS", + "EventCode": "0x0e", + "EventName": "UOPS_ISSUED.ANY", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Cycles when RAT does not issue Uops to RS for= the thread", + "CounterMask": "1", + "EventCode": "0x0E", + "EventName": "UOPS_ISSUED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts cycles during which the Resource Allo= cation Table (RAT) does not issue any Uops to the reservation station (RS) = for the current thread.", + "SampleAfterValue": "1000003", + "UMask": "0x1" + }, + { + "BriefDescription": "Uops inserted at issue-stage in order to pres= erve upper bits of vector registers.", + "EventCode": "0x0e", + "EventName": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH", + "PublicDescription": "Counts the number of Blend Uops issued by th= e Resource Allocation Table (RAT) to the reservation station (RS) in order = to preserve upper bits of vector registers. Starting with the Skylake micro= architecture, these Blend uops are needed since every Intel SSE instruction= executed in Dirty Upper State needs to preserve bits 128-255 of the destin= ation register. For more information, refer to 'Mixing Intel AVX and Intel = SSE Code' section of the Optimization Guide.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Retirement slots used.", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.SLOTS", + "PublicDescription": "Counts the retirement slots used each cycle.= ", + "SampleAfterValue": "2000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles without actually retired uops.", + "CounterMask": "1", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.STALL_CYCLES", + "Invert": "1", + "PublicDescription": "This event counts cycles without actually re= tired uops.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + }, + { + "BriefDescription": "Cycles with less than 10 actually retired uop= s.", + "CounterMask": "10", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.TOTAL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts the number of cycles using always tru= e condition (uops_ret < 16) applied to non PEBS uops retired event.", + "SampleAfterValue": "1000003", + "UMask": "0x2" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json b/t= ools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json new file mode 100644 index 000000000000..1bb9cededa56 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json @@ -0,0 +1,1571 @@ +[ + { + "BriefDescription": "C10 residency percent per package", + "MetricExpr": "cstate_pkg@c10\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C10_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C2 residency percent per package", + "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C2_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C3 residency percent per package", + "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C3_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C6 residency percent per core", + "MetricExpr": "cstate_core@c6\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C6_Core_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C6 residency percent per package", + "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C6_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C7 residency percent per core", + "MetricExpr": "cstate_core@c7\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C7_Core_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C7 residency percent per package", + "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C7_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C8 residency percent per package", + "MetricExpr": "cstate_pkg@c8\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C8_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "C9 residency percent per package", + "MetricExpr": "cstate_pkg@c9\\-residency@ / TSC", + "MetricGroup": "Power", + "MetricName": "C9_Pkg_Residency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Uncore frequency per die [GHZ]", + "MetricExpr": "tma_info_system_socket_clks / #num_dies / duration_= time / 1e9", + "MetricGroup": "SoC", + "MetricName": "UNCORE_FREQ" + }, + { + "BriefDescription": "Percentage of cycles spent in System Manageme= nt Interrupts.", + "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0= else 0)", + "MetricGroup": "smi", + "MetricName": "smi_cycles", + "MetricThreshold": "smi_cycles > 0.1", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Number of SMI interrupts.", + "MetricExpr": "msr@smi@", + "MetricGroup": "smi", + "MetricName": "smi_num", + "ScaleUnit": "1SMI#" + }, + { + "BriefDescription": "This metric estimates how often memory load a= ccesses were aliased by preceding stores (in program order) with a 4K addre= ss offset", + "MetricExpr": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS / tma_info_thread_c= lks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_4k_aliasing", + "MetricThreshold": "tma_4k_aliasing > 0.2 & (tma_l1_bound > 0.1 & = (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates how often memory load = accesses were aliased by preceding stores (in program order) with a 4K addr= ess offset. False match is possible; which incur a few cycles load re-issue= . However; the short re-issue duration is often hidden by the out-of-order = core and HW optimizations; hence a user may safely ignore a high value of t= his metric unless it manages to propagate up into parent nodes of the hiera= rchy (e.g. to L1_Bound).", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution ports for ALU operations.", + "MetricExpr": "(UOPS_DISPATCHED.PORT_0 + UOPS_DISPATCHED.PORT_1 + = UOPS_DISPATCHED.PORT_5 + UOPS_DISPATCHED.PORT_6) / (4 * tma_info_core_core_= clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_alu_op_utilization", + "MetricThreshold": "tma_alu_op_utilization > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of slots the C= PU retired uops delivered by the Microcode_Sequencer as a result of Assists= ", + "MetricExpr": "100 * ASSISTS.ANY / tma_info_thread_slots", + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", + "MetricName": "tma_assists", + "MetricThreshold": "tma_assists > 0.1 & (tma_microcode_sequencer >= 0.05 & tma_heavy_operations > 0.1)", + "PublicDescription": "This metric estimates fraction of slots the = CPU retired uops delivered by the Microcode_Sequencer as a result of Assist= s. Assists are long sequences of uops that are required in certain corner-c= ases for operations that cannot be handled natively by the execution pipeli= ne. For example; when working with very small floating point values (so-cal= led Denormals); the FP units are not set up to perform these operations nat= ively. Instead; a sequence of instructions to perform the computation on th= e Denormals is injected into the pipeline. Since these microcode sequences = might be dozens of uops long; Assists can be extremely deleterious to perfo= rmance and they can be avoided in many cases. Sample with: ASSISTS.ANY", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_backend_bound", + "MetricThreshold": "tma_backend_bound > 0.2", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_bad_speculation", + "MetricThreshold": "tma_bad_speculation > 0.15", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring branch instructions.", + "MetricExpr": "tma_light_operations * BR_INST_RETIRED.ALL_BRANCHES= / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_branch_instructions", + "MetricThreshold": "tma_branch_instructions > 0.1 & tma_light_oper= ations > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", + "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation", + "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", + "MetricName": "tma_branch_mispredicts", + "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES. Related metrics:= tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredic= tions, tma_mispredicts_resteers", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers", + "MetricExpr": "INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clk= s + tma_unknown_branches", + "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group", + "MetricName": "tma_branch_resteers", + "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latenc= y > 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers. Branch Resteers estimates the Fro= ntend delay in fetching operations from corrected path; following all sorts= of miss-predicted branches. For example; branchy code with lots of miss-pr= edictions might get categorized under Branch Resteers. Note the value of th= is node may overlap with its siblings. Sample with: BR_MISP_RETIRED.ALL_BRA= NCHES", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU retired uops originated from CISC (complex instruction set computer) in= struction", + "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)", + "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_gro= up", + "MetricName": "tma_cisc", + "MetricThreshold": "tma_cisc > 0.1 & (tma_microcode_sequencer > 0.= 05 & tma_heavy_operations > 0.1)", + "PublicDescription": "This metric estimates fraction of cycles the= CPU retired uops originated from CISC (complex instruction set computer) i= nstruction. A CISC instruction has multiple uops that are required to perfo= rm the instruction's functionality as in the case of read-modify-write as a= n example. Since these instructions require multiple uops they may or may n= ot imply sub-optimal use of machine resources.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Machine Clears", + "MetricExpr": "(1 - BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRE= D.ALL_BRANCHES + MACHINE_CLEARS.COUNT)) * INT_MISC.CLEAR_RESTEER_CYCLES / t= ma_info_thread_clks", + "MetricGroup": "BadSpec;MachineClears;TopdownL4;tma_L4_group;tma_b= ranch_resteers_group;tma_issueMC", + "MetricName": "tma_clears_resteers", + "MetricThreshold": "tma_clears_resteers > 0.05 & (tma_branch_reste= ers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Machine Clears. Sam= ple with: INT_MISC.CLEAR_RESTEER_CYCLES. Related metrics: tma_l1_bound, tma= _machine_clears, tma_microcode_sequencer, tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to contested acces= ses", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(29 * tma_info_system_average_frequency * MEM_LOAD_= L3_HIT_RETIRED.XSNP_HITM + 23.5 * tma_info_system_average_frequency * MEM_L= OAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RET= IRED.L1_MISS / 2) / tma_info_thread_clks", + "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_l3_bound_group", + "MetricName": "tma_contested_accesses", + "MetricThreshold": "tma_contested_accesses > 0.05 & (tma_l3_bound = > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to contested acce= sses. Contested accesses occur when data written by one Logical Processor a= re read by another Logical Processor on a different Physical Core. Examples= of contested accesses include synchronizations such as locks; true data sh= aring such as modified locked variables; and false sharing. Sample with: ME= M_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS;MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS. Re= lated metrics: tma_data_sharing, tma_false_sharing, tma_machine_clears, tma= _remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e Core non-memory issues were of a bottleneck", + "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)", + "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", + "MetricName": "tma_core_bound", + "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles whil= e the memory subsystem was handling synchronizations due to data-sharing ac= cesses", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "23.5 * tma_info_system_average_frequency * MEM_LOAD= _L3_HIT_RETIRED.XSNP_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.= L1_MISS / 2) / tma_info_thread_clks", + "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSync= xn;tma_l3_bound_group", + "MetricName": "tma_data_sharing", + "MetricThreshold": "tma_data_sharing > 0.05 & (tma_l3_bound > 0.05= & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whi= le the memory subsystem was handling synchronizations due to data-sharing a= ccesses. Data shared by multiple Logical Processors (even just read shared)= may cause increased access latency due to cache coherency. Excessive data = sharing can drastically harm multithreaded performance. Sample with: MEM_LO= AD_L3_HIT_RETIRED.XSNP_HIT_PS. Related metrics: tma_contested_accesses, tma= _false_sharing, tma_machine_clears, tma_remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re decoder-0 was the only active decoder", + "MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=3D1@ - cpu@INS= T_DECODED.DECODERS\\,cmask\\=3D2@) / tma_info_core_core_clks / 2", + "MetricGroup": "DSBmiss;FetchBW;TopdownL4;tma_L4_group;tma_issueD0= ;tma_mite_group", + "MetricName": "tma_decoder0_alone", + "MetricThreshold": "tma_decoder0_alone > 0.1 & (tma_mite > 0.1 & (= tma_fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc= / 5 > 0.35))", + "PublicDescription": "This metric represents fraction of cycles wh= ere decoder-0 was the only active decoder. Related metrics: tma_few_uops_in= structions", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re the Divider unit was active", + "MetricExpr": "ARITH.DIVIDER_ACTIVE / tma_info_thread_clks", + "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group", + "MetricName": "tma_divider", + "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tm= a_backend_bound > 0.2)", + "PublicDescription": "This metric represents fraction of cycles wh= ere the Divider unit was active. Divide and square root instructions are pe= rformed by the Divider unit and can take considerably longer latency than i= nteger or Floating Point addition; subtraction; or multiplication. Sample w= ith: ARITH.DIVIDER_ACTIVE", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled on accesses to external memory (DRAM) by loads", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "CYCLE_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clk= s + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks - tma_l2_bound", + "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", + "MetricName": "tma_dram_bound", + "MetricThreshold": "tma_dram_bound > 0.1 & (tma_memory_bound > 0.2= & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled on accesses to external memory (DRAM) by loads. Better caching can = improve the latency and increase performance. Sample with: MEM_LOAD_RETIRED= .L3_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to DSB (decoded uop cache) fetch pipe= line", + "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info= _core_core_clks / 2", + "MetricGroup": "DSB;FetchBW;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", + "MetricName": "tma_dsb", + "MetricThreshold": "tma_dsb > 0.15 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to DSB (decoded uop cache) fetch pip= eline. For example; inefficient utilization of the DSB cache structure or = bank conflict when reading from it; are categorized here.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to switches from DSB to MITE pipelines", + "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / tma_info_thread_= clks", + "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_= latency_group;tma_issueFB", + "MetricName": "tma_dsb_switches", + "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency >= 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to switches from DSB to MITE pipelines. The DSB (deco= ded i-cache) is a Uop Cache where the front-end directly delivers Uops (mic= ro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter la= tency and delivered higher bandwidth than the MITE (legacy instruction deco= de pipeline). Switching between the two pipelines can cause penalties hence= this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DS= B_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_mis= ses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the Data TLB (DTLB) was missed by load accesses", + "MetricExpr": "min(7 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=3D1= @ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE= _ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", + "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_= l1_bound_group", + "MetricName": "tma_dtlb_load", + "MetricThreshold": "tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (t= ma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates the fraction o= f cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Trans= lation Look-aside Buffers) are processor caches for recently used entries o= ut of the Page Tables that are used to map virtual- to physical-addresses b= y the operating system. This metric approximates the potential delay of dem= and loads missing the first-level data TLB (assuming worst case scenario wi= th back to back misses to different pages). This includes hitting in the se= cond-level TLB (STLB) as well as performing a hardware page walk on an STLB= miss. Sample with: MEM_INST_RETIRED.STLB_MISS_LOADS_PS. Related metrics: t= ma_dtlb_store, tma_info_bottleneck_memory_data_tlbs", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles spent handling first-level data TLB store misses", + "MetricExpr": "(7 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=3D1@ = + DTLB_STORE_MISSES.WALK_ACTIVE) / tma_info_core_core_clks", + "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_= store_bound_group", + "MetricName": "tma_dtlb_store", + "MetricThreshold": "tma_dtlb_store > 0.05 & (tma_store_bound > 0.2= & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates the fraction o= f cycles spent handling first-level data TLB store misses. As with ordinar= y data caching; focus on improving data locality and reducing working-set s= ize to reduce DTLB overhead. Additionally; consider using profile-guided o= ptimization (PGO) to collocate frequently-used data on the same page. Try = using larger page sizes for large amounts of frequently-used data. Sample w= ith: MEM_INST_RETIRED.STLB_MISS_STORES_PS. Related metrics: tma_dtlb_load, = tma_info_bottleneck_memory_data_tlbs", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates how often CPU w= as handling synchronizations due to False Sharing", + "MetricExpr": "32.5 * tma_info_system_average_frequency * OCR.DEMA= ND_RFO.L3_HIT.SNOOP_HITM / tma_info_thread_clks", + "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;t= ma_issueSyncxn;tma_store_bound_group", + "MetricName": "tma_false_sharing", + "MetricThreshold": "tma_false_sharing > 0.05 & (tma_store_bound > = 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates how often CPU = was handling synchronizations due to False Sharing. False Sharing is a mult= ithreading hiccup; where multiple Logical Processors contend on different d= ata-elements mapped into the same cache line. Sample with: OCR.DEMAND_RFO.L= 3_HIT.SNOOP_HITM. Related metrics: tma_contested_accesses, tma_data_sharing= , tma_machine_clears, tma_remote_cache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric does a *rough estimation* of how = often L1D Fill Buffer unavailability limited additional L1D miss memory acc= ess requests to proceed", + "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks", + "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_is= sueSL;tma_issueSmSt;tma_l1_bound_group", + "MetricName": "tma_fb_full", + "MetricThreshold": "tma_fb_full > 0.3", + "PublicDescription": "This metric does a *rough estimation* of how= often L1D Fill Buffer unavailability limited additional L1D miss memory ac= cess requests to proceed. The higher the metric value; the deeper the memor= y hierarchy level the misses are satisfied from (metric values >1 are valid= ). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or= external memory). Related metrics: tma_info_bottleneck_memory_bandwidth, t= ma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_laten= cy, tma_streaming_stores", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend bandwidth issues", + "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)", + "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", + "MetricName": "tma_fetch_bandwidth", + "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_thread_ipc / 5 > 0.35", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_= info_inst_mix_iptb, tma_lcp", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", + "MetricExpr": "(5 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.COR= E - INT_MISC.UOP_DROPPING) / tma_info_thread_slots", + "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", + "MetricName": "tma_fetch_latency", + "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring instructions that that are decoder into two or up to= ([SNB+] four; [ADL+] five) uops", + "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer", + "MetricGroup": "TopdownL3;tma_L3_group;tma_heavy_operations_group;= tma_issueD0", + "MetricName": "tma_few_uops_instructions", + "MetricThreshold": "tma_few_uops_instructions > 0.05 & tma_heavy_o= perations > 0.1", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring instructions that that are decoder into two or up t= o ([SNB+] four; [ADL+] five) uops. This highly-correlates with the number o= f uops in such instructions. Related metrics: tma_decoder0_alone", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents overall arithmetic flo= ating-point (FP) operations fraction the CPU has executed (retired)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_x87_use + tma_fp_scalar + tma_fp_vector", + "MetricGroup": "HPC;TopdownL3;tma_L3_group;tma_light_operations_gr= oup", + "MetricName": "tma_fp_arith", + "MetricThreshold": "tma_fp_arith > 0.2 & tma_light_operations > 0.= 6", + "PublicDescription": "This metric represents overall arithmetic fl= oating-point (FP) operations fraction the CPU has executed (retired). Note = this metric's value may exceed its parent due to use of \"Uops\" CountDomai= n and FMA double-counting.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic floating-= point (FP) scalar uops fraction the CPU has retired", + "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", + "MetricName": "tma_fp_scalar", + "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", + "PublicDescription": "This metric approximates arithmetic floating= -point (FP) scalar uops fraction the CPU has retired. May overcount due to = FMA double counting. Related metrics: tma_fp_vector, tma_fp_vector_128b, tm= a_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5, t= ma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic floating-= point (FP) vector uops fraction the CPU has retired aggregated across all v= ector widths", + "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umas= k\\=3D0xfc@ / (tma_retiring * tma_info_thread_slots)", + "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_= group;tma_issue2P", + "MetricName": "tma_fp_vector", + "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tm= a_light_operations > 0.6)", + "PublicDescription": "This metric approximates arithmetic floating= -point (FP) vector uops fraction the CPU has retired aggregated across all = vector widths. May overcount due to FMA double counting. Related metrics: t= ma_fp_scalar, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b, t= ma_port_0, tma_port_1, tma_port_5, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 128-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.128B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_128b", + "MetricThreshold": "tma_fp_vector_128b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 128-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 256-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.256B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_256b", + "MetricThreshold": "tma_fp_vector_256b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 256-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_128b, tma_fp_vector_512b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric approximates arithmetic FP vector= uops fraction the CPU has retired for 512-bit wide vectors", + "MetricExpr": "(FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE + FP_ARIT= H_INST_RETIRED.512B_PACKED_SINGLE) / (tma_retiring * tma_info_thread_slots)= ", + "MetricGroup": "Compute;Flops;TopdownL5;tma_L5_group;tma_fp_vector= _group;tma_issue2P", + "MetricName": "tma_fp_vector_512b", + "MetricThreshold": "tma_fp_vector_512b > 0.1 & (tma_fp_vector > 0.= 1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6))", + "PublicDescription": "This metric approximates arithmetic FP vecto= r uops fraction the CPU has retired for 512-bit wide vectors. May overcount= due to FMA double counting. Related metrics: tma_fp_scalar, tma_fp_vector,= tma_fp_vector_128b, tma_fp_vector_256b, tma_port_0, tma_port_1, tma_port_5= , tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_frontend_bound", + "MetricThreshold": "tma_frontend_bound > 0.15", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", + "MetricExpr": "tma_microcode_sequencer + tma_retiring * (UOPS_DECO= DED.DEC0 - cpu@UOPS_DECODED.DEC0\\,cmask\\=3D1@) / IDQ.MITE_UOPS", + "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricName": "tma_heavy_operations", + "MetricThreshold": "tma_heavy_operations > 0.1", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to instruction cache misses", + "MetricExpr": "ICACHE_16B.IFDATA_STALL / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma= _fetch_latency_group", + "MetricName": "tma_icache_misses", + "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency = > 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RE= TIRED.L2_MISS_PS;FRONTEND_RETIRED.L1I_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Branch Misprediction Cost: Fraction of TMA sl= ots wasted per non-speculative branch misprediction (retired JEClear)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_m= ispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_= misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_sl= ots / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;BrMispredicts;tma_issueBM", + "MetricName": "tma_info_bad_spec_branch_misprediction_cost", + "PublicDescription": "Branch Misprediction Cost: Fraction of TMA s= lots wasted per non-speculative branch misprediction (retired JEClear). Rel= ated metrics: tma_branch_mispredicts, tma_info_bottleneck_mispredictions, t= ma_mispredicts_resteers" + }, + { + "BriefDescription": "Instructions per retired mispredicts for cond= itional non-taken branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_NTAKEN", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_cond_ntaken", + "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_ntaken < 200" + }, + { + "BriefDescription": "Instructions per retired mispredicts for cond= itional taken branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_cond_taken", + "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_taken < 200" + }, + { + "BriefDescription": "Instructions per retired mispredicts for indi= rect CALL or JMP branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.INDIRECT", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_indirect", + "MetricThreshold": "tma_info_bad_spec_ipmisp_indirect < 1e3" + }, + { + "BriefDescription": "Instructions per retired mispredicts for retu= rn branches (lower number means higher occurrence rate).", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RET", + "MetricGroup": "Bad;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmisp_ret", + "MetricThreshold": "tma_info_bad_spec_ipmisp_ret < 500" + }, + { + "BriefDescription": "Number of Instructions per non-speculative Br= anch Misprediction (JEClear) (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;BadSpec;BrMispredicts", + "MetricName": "tma_info_bad_spec_ipmispredict", + "MetricThreshold": "tma_info_bad_spec_ipmispredict < 200" + }, + { + "BriefDescription": "Probability of Core Bound bottleneck hidden b= y SMT-profiling artifacts", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization = if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t= _utilization > 0.5 else 0)", + "MetricGroup": "Cor;SMT", + "MetricName": "tma_info_botlnk_l0_core_bound_likely", + "MetricThreshold": "tma_info_botlnk_l0_core_bound_likely > 0.5" + }, + { + "BriefDescription": "Total pipeline cost of DSB (uop cache) misses= - subset of the Instruction_Fetch_BW Bottleneck", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_= branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + = tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tm= a_lsd + tma_mite))", + "MetricGroup": "DSBmiss;Fed;tma_issueFB", + "MetricName": "tma_info_botlnk_l2_dsb_misses", + "MetricThreshold": "tma_info_botlnk_l2_dsb_misses > 10", + "PublicDescription": "Total pipeline cost of DSB (uop cache) misse= s - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb= _switches, tma_fetch_bandwidth, tma_info_frontend_dsb_coverage, tma_info_in= st_mix_iptb, tma_lcp" + }, + { + "BriefDescription": "Total pipeline cost of Instruction Cache miss= es - subset of the Big_Code Bottleneck", + "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma= _branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses += tma_lcp + tma_ms_switches))", + "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL", + "MetricName": "tma_info_botlnk_l2_ic_misses", + "MetricThreshold": "tma_info_botlnk_l2_ic_misses > 5", + "PublicDescription": "Total pipeline cost of Instruction Cache mis= ses - subset of the Big_Code Bottleneck. Related metrics: " + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch rela= ted bottlenecks by large code footprint programs (i-side cache; TLB and BTB= misses)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_ic= ache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switch= es + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", + "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC", + "MetricName": "tma_info_bottleneck_big_code", + "MetricThreshold": "tma_info_bottleneck_big_code > 20", + "PublicDescription": "Total pipeline cost of instruction fetch rel= ated bottlenecks by large code footprint programs (i-side cache; TLB and BT= B misses). Related metrics: tma_info_bottleneck_branching_overhead" + }, + { + "BriefDescription": "Total pipeline cost of branch related instruc= tions (used for program control-flow including function calls)", + "MetricExpr": "100 * ((BR_INST_RETIRED.COND + 3 * BR_INST_RETIRED.= NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * = BR_INST_RETIRED.NEAR_CALL)) / tma_info_thread_slots)", + "MetricGroup": "Ret;tma_issueBC", + "MetricName": "tma_info_bottleneck_branching_overhead", + "MetricThreshold": "tma_info_bottleneck_branching_overhead > 10", + "PublicDescription": "Total pipeline cost of branch related instru= ctions (used for program control-flow including function calls). Related me= trics: tma_info_bottleneck_big_code" + }, + { + "BriefDescription": "Total pipeline cost of instruction fetch band= width related bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma= _mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icach= e_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottlen= eck_big_code", + "MetricGroup": "Fed;FetchBW;Frontend", + "MetricName": "tma_info_bottleneck_instruction_fetch_bw", + "MetricThreshold": "tma_info_bottleneck_instruction_fetch_bw > 20" + }, + { + "BriefDescription": "Total pipeline cost of (external) Memory Band= width related bottlenecks", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound /= (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_b= ound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_= hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_boun= d + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_4k= _aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_load= s + tma_store_fwd_blk))", + "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", + "MetricName": "tma_info_bottleneck_memory_bandwidth", + "MetricThreshold": "tma_info_bottleneck_memory_bandwidth > 20", + "PublicDescription": "Total pipeline cost of (external) Memory Ban= dwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_system_d= ram_bw_use, tma_mem_bandwidth, tma_sq_full" + }, + { + "BriefDescription": "Total pipeline cost of Memory Address Transla= tion related bottlenecks (data-side TLBs)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_me= mory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + t= ma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_= dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fw= d_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound += tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_= false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores= )))", + "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", + "MetricName": "tma_info_bottleneck_memory_data_tlbs", + "MetricThreshold": "tma_info_bottleneck_memory_data_tlbs > 20", + "PublicDescription": "Total pipeline cost of Memory Address Transl= ation related bottlenecks (data-side TLBs). Related metrics: tma_dtlb_load,= tma_dtlb_store" + }, + { + "BriefDescription": "Total pipeline cost of Memory Latency related= bottlenecks (external memory and off-core caches)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dra= m_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (= tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (= tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bou= nd) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tm= a_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_= bound + tma_l2_bound + tma_l3_bound + tma_store_bound))", + "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", + "MetricName": "tma_info_bottleneck_memory_latency", + "MetricThreshold": "tma_info_bottleneck_memory_latency > 20", + "PublicDescription": "Total pipeline cost of Memory Latency relate= d bottlenecks (external memory and off-core caches). Related metrics: tma_l= 3_hit_latency, tma_mem_latency" + }, + { + "BriefDescription": "Total pipeline cost of Branch Misprediction r= elated bottlenecks", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency *= tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_i= cache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", + "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", + "MetricName": "tma_info_bottleneck_mispredictions", + "MetricThreshold": "tma_info_bottleneck_mispredictions > 20", + "PublicDescription": "Total pipeline cost of Branch Misprediction = related bottlenecks. Related metrics: tma_branch_mispredicts, tma_info_bad_= spec_branch_misprediction_cost, tma_mispredicts_resteers" + }, + { + "BriefDescription": "Fraction of branches that are CALL or RET", + "MetricExpr": "(BR_INST_RETIRED.NEAR_CALL + BR_INST_RETIRED.NEAR_R= ETURN) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_callret" + }, + { + "BriefDescription": "Fraction of branches that are non-taken condi= tionals", + "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_B= RANCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "tma_info_branches_cond_nt" + }, + { + "BriefDescription": "Fraction of branches that are taken condition= als", + "MetricExpr": "BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BR= ANCHES", + "MetricGroup": "Bad;Branches;CodeGen;PGO", + "MetricName": "tma_info_branches_cond_tk" + }, + { + "BriefDescription": "Fraction of branches that are unconditional (= direct or indirect) jumps", + "MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_= TAKEN - 2 * BR_INST_RETIRED.NEAR_CALL) / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_jump" + }, + { + "BriefDescription": "Fraction of branches of other types (not indi= vidually covered by other metrics in Info.Branches group)", + "MetricExpr": "1 - (tma_info_branches_cond_nt + tma_info_branches_= cond_tk + tma_info_branches_callret + tma_info_branches_jump)", + "MetricGroup": "Bad;Branches", + "MetricName": "tma_info_branches_other_branches" + }, + { + "BriefDescription": "Core actual clocks when any Logical Processor= is active on the Physical Core", + "MetricExpr": "CPU_CLK_UNHALTED.DISTRIBUTED", + "MetricGroup": "SMT", + "MetricName": "tma_info_core_core_clks" + }, + { + "BriefDescription": "Instructions Per Cycle across hyper-threads (= per physical core)", + "MetricExpr": "INST_RETIRED.ANY / tma_info_core_core_clks", + "MetricGroup": "Ret;SMT;TmaL1;tma_L1_group", + "MetricName": "tma_info_core_coreipc" + }, + { + "BriefDescription": "Floating Point Operations Per Cycle", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / tma_info_core_core_clks", + "MetricGroup": "Flops;Ret", + "MetricName": "tma_info_core_flopc" + }, + { + "BriefDescription": "Actual per-core usage of the Floating Point n= on-X87 execution units (regardless of precision or vector-width)", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=3D0xfc@) = / (2 * tma_info_core_core_clks)", + "MetricGroup": "Cor;Flops;HPC", + "MetricName": "tma_info_core_fp_arith_utilization", + "PublicDescription": "Actual per-core usage of the Floating Point = non-X87 execution units (regardless of precision or vector-width). Values >= 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; = [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less commo= n)." + }, + { + "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-core", + "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_G= E_1 / 2 if #SMT_on else UOPS_EXECUTED.CORE_CYCLES_GE_1)", + "MetricGroup": "Backend;Cor;Pipeline;PortsUtil", + "MetricName": "tma_info_core_ilp" + }, + { + "BriefDescription": "Fraction of Uops delivered by the DSB (aka De= coded ICache; or Uop Cache)", + "MetricExpr": "IDQ.DSB_UOPS / UOPS_ISSUED.ANY", + "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", + "MetricName": "tma_info_frontend_dsb_coverage", + "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_inf= o_thread_ipc / 5 > 0.35", + "PublicDescription": "Fraction of Uops delivered by the DSB (aka D= ecoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_= bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" + }, + { + "BriefDescription": "Average number of cycles of a switch from the= DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details= .", + "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu@DSB2MITE_SWI= TCHES.PENALTY_CYCLES\\,cmask\\=3D1\\,edge@", + "MetricGroup": "DSBmiss", + "MetricName": "tma_info_frontend_dsb_switch_cost" + }, + { + "BriefDescription": "Average number of Uops issued by front-end wh= en it issued something", + "MetricExpr": "UOPS_ISSUED.ANY / cpu@UOPS_ISSUED.ANY\\,cmask\\=3D1= @", + "MetricGroup": "Fed;FetchBW", + "MetricName": "tma_info_frontend_fetch_upc" + }, + { + "BriefDescription": "Average Latency for L1 instruction cache miss= es", + "MetricExpr": "ICACHE_16B.IFDATA_STALL / cpu@ICACHE_16B.IFDATA_STA= LL\\,cmask\\=3D1\\,edge@", + "MetricGroup": "Fed;FetchLat;IcMiss", + "MetricName": "tma_info_frontend_icache_miss_latency" + }, + { + "BriefDescription": "Instructions per non-speculative DSB miss (lo= wer number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS", + "MetricGroup": "DSBmiss;Fed", + "MetricName": "tma_info_frontend_ipdsb_miss_ret", + "MetricThreshold": "tma_info_frontend_ipdsb_miss_ret < 50" + }, + { + "BriefDescription": "Instructions per speculative Unknown Branch M= isprediction (BAClear) (lower number means higher occurrence rate)", + "MetricExpr": "tma_info_inst_mix_instructions / BACLEARS.ANY", + "MetricGroup": "Fed", + "MetricName": "tma_info_frontend_ipunknown_branch" + }, + { + "BriefDescription": "L2 cache true code cacheline misses per kilo = instruction", + "MetricExpr": "1e3 * FRONTEND_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "IcMiss", + "MetricName": "tma_info_frontend_l2mpki_code" + }, + { + "BriefDescription": "L2 cache speculative code cacheline misses pe= r kilo instruction", + "MetricExpr": "1e3 * L2_RQSTS.CODE_RD_MISS / INST_RETIRED.ANY", + "MetricGroup": "IcMiss", + "MetricName": "tma_info_frontend_l2mpki_code_all" + }, + { + "BriefDescription": "Fraction of Uops delivered by the LSD (Loop S= tream Detector; aka Loop Cache)", + "MetricExpr": "LSD.UOPS / UOPS_ISSUED.ANY", + "MetricGroup": "Fed;LSD", + "MetricName": "tma_info_frontend_lsd_coverage" + }, + { + "BriefDescription": "Branch instructions per taken branch.", + "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR= _TAKEN", + "MetricGroup": "Branches;Fed;PGO", + "MetricName": "tma_info_inst_mix_bptkbranch" + }, + { + "BriefDescription": "Total number of retired Instructions", + "MetricExpr": "INST_RETIRED.ANY", + "MetricGroup": "Summary;TmaL1;tma_L1_group", + "MetricName": "tma_info_inst_mix_instructions", + "PublicDescription": "Total number of retired Instructions. Sample= with: INST_RETIRED.PREC_DIST" + }, + { + "BriefDescription": "Instructions per FP Arithmetic instruction (l= ower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALA= R_SINGLE\\,umask\\=3D0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\= ,umask\\=3D0xfc@)", + "MetricGroup": "Flops;InsType", + "MetricName": "tma_info_inst_mix_iparith", + "MetricThreshold": "tma_info_inst_mix_iparith < 10", + "PublicDescription": "Instructions per FP Arithmetic instruction (= lower number means higher occurrence rate). May undercount due to FMA doubl= e counting. Approximated prior to BDW." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bi= t instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.128B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx128", + "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-b= it instruction (lower number means higher occurrence rate). May undercount = due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit i= nstruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.256B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx256", + "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit = instruction (lower number means higher occurrence rate). May undercount due= to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic AVX 512-bit in= struction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.512B_PACK= ED_DOUBLE + FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE)", + "MetricGroup": "Flops;FpVector;InsType", + "MetricName": "tma_info_inst_mix_iparith_avx512", + "MetricThreshold": "tma_info_inst_mix_iparith_avx512 < 10", + "PublicDescription": "Instructions per FP Arithmetic AVX 512-bit i= nstruction (lower number means higher occurrence rate). May undercount due = to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Double-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOU= BLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "tma_info_inst_mix_iparith_scalar_dp", + "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10", + "PublicDescription": "Instructions per FP Arithmetic Scalar Double= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per FP Arithmetic Scalar Single-= Precision instruction (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SIN= GLE", + "MetricGroup": "Flops;FpScalar;InsType", + "MetricName": "tma_info_inst_mix_iparith_scalar_sp", + "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10", + "PublicDescription": "Instructions per FP Arithmetic Scalar Single= -Precision instruction (lower number means higher occurrence rate). May und= ercount due to FMA double counting." + }, + { + "BriefDescription": "Instructions per Branch (lower number means h= igher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES", + "MetricGroup": "Branches;Fed;InsType", + "MetricName": "tma_info_inst_mix_ipbranch", + "MetricThreshold": "tma_info_inst_mix_ipbranch < 8" + }, + { + "BriefDescription": "Instructions per (near) call (lower number me= ans higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL", + "MetricGroup": "Branches;Fed;PGO", + "MetricName": "tma_info_inst_mix_ipcall", + "MetricThreshold": "tma_info_inst_mix_ipcall < 200" + }, + { + "BriefDescription": "Instructions per Floating Point (FP) Operatio= n (lower number means higher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALA= R_SINGLE\\,umask\\=3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE += 4 * cpu@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * c= pu@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARI= TH_INST_RETIRED.512B_PACKED_SINGLE)", + "MetricGroup": "Flops;InsType", + "MetricName": "tma_info_inst_mix_ipflop", + "MetricThreshold": "tma_info_inst_mix_ipflop < 10" + }, + { + "BriefDescription": "Instructions per Load (lower number means hig= her occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS", + "MetricGroup": "InsType", + "MetricName": "tma_info_inst_mix_ipload", + "MetricThreshold": "tma_info_inst_mix_ipload < 3" + }, + { + "BriefDescription": "Instructions per Store (lower number means hi= gher occurrence rate)", + "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES", + "MetricGroup": "InsType", + "MetricName": "tma_info_inst_mix_ipstore", + "MetricThreshold": "tma_info_inst_mix_ipstore < 8" + }, + { + "BriefDescription": "Instructions per Software prefetch instructio= n (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrenc= e rate)", + "MetricExpr": "INST_RETIRED.ANY / cpu@SW_PREFETCH_ACCESS.T0\\,umas= k\\=3D0xF@", + "MetricGroup": "Prefetches", + "MetricName": "tma_info_inst_mix_ipswpf", + "MetricThreshold": "tma_info_inst_mix_ipswpf < 100" + }, + { + "BriefDescription": "Instruction per taken branch", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", + "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB", + "MetricName": "tma_info_inst_mix_iptb", + "MetricThreshold": "tma_info_inst_mix_iptb < 11", + "PublicDescription": "Instruction per taken branch. Related metric= s: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tm= a_info_frontend_dsb_coverage, tma_lcp" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 1 data cache [GB / sec]", + "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l1d_cache_fill_bw" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 2 cache [GB / sec]", + "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l2_cache_fill_bw" + }, + { + "BriefDescription": "Average per-core data access bandwidth to the= L3 cache [GB / sec]", + "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration= _time", + "MetricGroup": "Mem;MemoryBW;Offcore", + "MetricName": "tma_info_memory_core_l3_cache_access_bw" + }, + { + "BriefDescription": "Average per-core data fill bandwidth to the L= 3 cache [GB / sec]", + "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_core_l3_cache_fill_bw" + }, + { + "BriefDescription": "Fill Buffer (FB) hits per kilo instructions f= or retired demand loads (L1D misses that merge into ongoing miss-handling e= ntries)", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_fb_hpki" + }, + { + "BriefDescription": "L1 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L1_MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l1mpki" + }, + { + "BriefDescription": "L1 cache true misses per kilo instruction for= all demand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.ALL_DEMAND_DATA_RD / INST_RETIRED.AN= Y", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l1mpki_load" + }, + { + "BriefDescription": "L2 cache hits per kilo instruction for all re= quest types (including speculative)", + "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_= RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2hpki_all" + }, + { + "BriefDescription": "L2 cache hits per kilo instruction for all de= mand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_HIT / INST_RETIRED.AN= Y", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2hpki_load" + }, + { + "BriefDescription": "L2 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L2_MISS / INST_RETIRED.ANY", + "MetricGroup": "Backend;CacheMisses;Mem", + "MetricName": "tma_info_memory_l2mpki" + }, + { + "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all request types (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem;Offcore", + "MetricName": "tma_info_memory_l2mpki_all" + }, + { + "BriefDescription": "L2 cache ([RKL+] true) misses per kilo instru= ction for all demand loads (including speculative)", + "MetricExpr": "1e3 * L2_RQSTS.DEMAND_DATA_RD_MISS / INST_RETIRED.A= NY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l2mpki_load" + }, + { + "BriefDescription": "L3 cache true misses per kilo instruction for= retired demand loads", + "MetricExpr": "1e3 * MEM_LOAD_RETIRED.L3_MISS / INST_RETIRED.ANY", + "MetricGroup": "CacheMisses;Mem", + "MetricName": "tma_info_memory_l3mpki" + }, + { + "BriefDescription": "Actual Average Latency for L1 data-cache miss= demand load operations (in core cycles)", + "MetricExpr": "L1D_PEND_MISS.PENDING / (MEM_LOAD_RETIRED.L1_MISS += MEM_LOAD_RETIRED.FB_HIT)", + "MetricGroup": "Mem;MemoryBound;MemoryLat", + "MetricName": "tma_info_memory_load_miss_real_latency" + }, + { + "BriefDescription": "Memory-Level-Parallelism (average number of L= 1 miss demand load when there is at least one such miss", + "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLE= S", + "MetricGroup": "Mem;MemoryBW;MemoryBound", + "MetricName": "tma_info_memory_mlp", + "PublicDescription": "Memory-Level-Parallelism (average number of = L1 miss demand load when there is at least one such miss. Per-Logical Proce= ssor)" + }, + { + "BriefDescription": "Average Parallel L2 cache miss data reads", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD / OFFCORE_= REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", + "MetricGroup": "Memory_BW;Offcore", + "MetricName": "tma_info_memory_oro_data_l2_mlp" + }, + { + "BriefDescription": "Average Latency for L2 cache miss demand Load= s", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCO= RE_REQUESTS.DEMAND_DATA_RD", + "MetricGroup": "Memory_Lat;Offcore", + "MetricName": "tma_info_memory_oro_load_l2_miss_latency" + }, + { + "BriefDescription": "Average Parallel L2 cache miss demand Loads", + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / cpu@O= FFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=3D1@", + "MetricGroup": "Memory_BW;Offcore", + "MetricName": "tma_info_memory_oro_load_l2_mlp" + }, + { + "BriefDescription": "Average Latency for L3 cache miss demand Load= s", + "MetricExpr": "cpu@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,u= mask\\=3D0x10@ / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", + "MetricGroup": "Memory_Lat;Offcore", + "MetricName": "tma_info_memory_oro_load_l3_miss_latency" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L1 data cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l1d_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l1d_cache_fill_bw_1t" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L2 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l2_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l2_cache_fill_bw_1t" + }, + { + "BriefDescription": "Average per-thread data access bandwidth to t= he L3 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l3_cache_access_bw", + "MetricGroup": "Mem;MemoryBW;Offcore", + "MetricName": "tma_info_memory_thread_l3_cache_access_bw_1t" + }, + { + "BriefDescription": "Average per-thread data fill bandwidth to the= L3 cache [GB / sec]", + "MetricExpr": "tma_info_memory_core_l3_cache_fill_bw", + "MetricGroup": "Mem;MemoryBW", + "MetricName": "tma_info_memory_thread_l3_cache_fill_bw_1t" + }, + { + "BriefDescription": "STLB (2nd level TLB) code speculative misses = per kilo instruction (misses of any page-size that complete the page walk)", + "MetricExpr": "1e3 * ITLB_MISSES.WALK_COMPLETED / INST_RETIRED.ANY= ", + "MetricGroup": "Fed;MemoryTLB", + "MetricName": "tma_info_memory_tlb_code_stlb_mpki" + }, + { + "BriefDescription": "STLB (2nd level TLB) data load speculative mi= sses per kilo instruction (misses of any page-size that complete the page w= alk)", + "MetricExpr": "1e3 * DTLB_LOAD_MISSES.WALK_COMPLETED / INST_RETIRE= D.ANY", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_load_stlb_mpki" + }, + { + "BriefDescription": "Utilization of the core's Page Walker(s) serv= ing STLB misses triggered by instruction/Load/Store accesses", + "MetricExpr": "(ITLB_MISSES.WALK_PENDING + DTLB_LOAD_MISSES.WALK_P= ENDING + DTLB_STORE_MISSES.WALK_PENDING) / (2 * tma_info_core_core_clks)", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_page_walks_utilization", + "MetricThreshold": "tma_info_memory_tlb_page_walks_utilization > 0= .5" + }, + { + "BriefDescription": "STLB (2nd level TLB) data store speculative m= isses per kilo instruction (misses of any page-size that complete the page = walk)", + "MetricExpr": "1e3 * DTLB_STORE_MISSES.WALK_COMPLETED / INST_RETIR= ED.ANY", + "MetricGroup": "Mem;MemoryTLB", + "MetricName": "tma_info_memory_tlb_store_stlb_mpki" + }, + { + "BriefDescription": "Instruction-Level-Parallelism (average number= of uops executed when there is execution) per-thread", + "MetricExpr": "UOPS_EXECUTED.THREAD / cpu@UOPS_EXECUTED.THREAD\\,c= mask\\=3D1@", + "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", + "MetricName": "tma_info_pipeline_execute" + }, + { + "BriefDescription": "Average number of Uops retired in cycles wher= e at least one uop has retired.", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_retiring * tma_info_thread_slots / cpu@UOPS_RET= IRED.SLOTS\\,cmask\\=3D1@", + "MetricGroup": "Pipeline;Ret", + "MetricName": "tma_info_pipeline_retire" + }, + { + "BriefDescription": "Measured Average Frequency for unhalted proce= ssors [GHz]", + "MetricExpr": "tma_info_system_turbo_utilization * TSC / 1e9 / dur= ation_time", + "MetricGroup": "Power;Summary", + "MetricName": "tma_info_system_average_frequency" + }, + { + "BriefDescription": "Average CPU Utilization", + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", + "MetricGroup": "HPC;Summary", + "MetricName": "tma_info_system_cpu_utilization" + }, + { + "BriefDescription": "Average external Memory Bandwidth Use for rea= ds and writes [GB / sec]", + "MetricExpr": "64 * (UNC_ARB_TRK_REQUESTS.ALL + UNC_ARB_COH_TRK_RE= QUESTS.ALL) / 1e6 / duration_time / 1e3", + "MetricGroup": "HPC;Mem;MemoryBW;SoC;tma_issueBW", + "MetricName": "tma_info_system_dram_bw_use", + "PublicDescription": "Average external Memory Bandwidth Use for re= ads and writes [GB / sec]. Related metrics: tma_fb_full, tma_info_bottlenec= k_memory_bandwidth, tma_mem_bandwidth, tma_sq_full" + }, + { + "BriefDescription": "Giga Floating Point Operations Per Second", + "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\= =3D0x03@ + 2 * FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * cpu@FP_ARITH_= INST_RETIRED.128B_PACKED_SINGLE\\,umask\\=3D0x18@ + 8 * cpu@FP_ARITH_INST_R= ETIRED.256B_PACKED_SINGLE\\,umask\\=3D0x60@ + 16 * FP_ARITH_INST_RETIRED.51= 2B_PACKED_SINGLE) / 1e9 / duration_time", + "MetricGroup": "Cor;Flops;HPC", + "MetricName": "tma_info_system_gflops", + "PublicDescription": "Giga Floating Point Operations Per Second. A= ggregate across all supported options of: FP precisions, scalar and vector = instructions, vector-width and AMX engine." + }, + { + "BriefDescription": "Instructions per Far Branch ( Far Branches ap= ply upon transition from application to operating system, handling interrup= ts, exceptions) [lower number means higher occurrence rate]", + "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.FAR_BRANCH:u", + "MetricGroup": "Branches;OS", + "MetricName": "tma_info_system_ipfarbranch", + "MetricThreshold": "tma_info_system_ipfarbranch < 1e6" + }, + { + "BriefDescription": "Cycles Per Instruction for the Operating Syst= em (OS) Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / INST_RETIRED.ANY_P:k", + "MetricGroup": "OS", + "MetricName": "tma_info_system_kernel_cpi" + }, + { + "BriefDescription": "Fraction of cycles spent in the Operating Sys= tem (OS) Kernel mode", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD_P:k / CPU_CLK_UNHALTED.THRE= AD", + "MetricGroup": "OS", + "MetricName": "tma_info_system_kernel_utilization", + "MetricThreshold": "tma_info_system_kernel_utilization > 0.05" + }, + { + "BriefDescription": "Average number of parallel data read requests= to external memory", + "MetricExpr": "UNC_ARB_DAT_OCCUPANCY.RD / cpu@UNC_ARB_DAT_OCCUPANC= Y.RD\\,cmask\\=3D1@", + "MetricGroup": "Mem;MemoryBW;SoC", + "MetricName": "tma_info_system_mem_parallel_reads", + "PublicDescription": "Average number of parallel data read request= s to external memory. Accounts for demand loads and L1/L2 prefetches" + }, + { + "BriefDescription": "Average latency of data read request to exter= nal memory (in nanoseconds)", + "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.RD + UNC_ARB_DAT_OCCUPANCY.R= D) / UNC_ARB_TRK_REQUESTS.RD", + "MetricGroup": "Mem;MemoryLat;SoC", + "MetricName": "tma_info_system_mem_read_latency", + "PublicDescription": "Average latency of data read request to exte= rnal memory (in nanoseconds). Accounts for demand loads and L1/L2 prefetche= s. ([RKL+]memory-controller only)" + }, + { + "BriefDescription": "Average latency of all requests to external m= emory (in Uncore cycles)", + "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.ALL + UNC_ARB_DAT_OCCUPANCY.= RD) / UNC_ARB_TRK_REQUESTS.ALL", + "MetricGroup": "Mem;SoC", + "MetricName": "tma_info_system_mem_request_latency" + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for baseline license level 0", + "MetricExpr": "CORE_POWER.LVL0_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license0_utilization", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for baseline license level 0. This includes non= -AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 1", + "MetricExpr": "CORE_POWER.LVL1_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license1_utilization", + "MetricThreshold": "tma_info_system_power_license1_utilization > 0= .5", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 1. This includes high current= AVX 256-bit instructions as well as low current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of Core cycles where the core was ru= nning with power-delivery for license level 2 (introduced in SKX)", + "MetricExpr": "CORE_POWER.LVL2_TURBO_LICENSE / tma_info_core_core_= clks", + "MetricGroup": "Power", + "MetricName": "tma_info_system_power_license2_utilization", + "MetricThreshold": "tma_info_system_power_license2_utilization > 0= .5", + "PublicDescription": "Fraction of Core cycles where the core was r= unning with power-delivery for license level 2 (introduced in SKX). This i= ncludes high current AVX 512-bit instructions." + }, + { + "BriefDescription": "Fraction of cycles where both hardware Logica= l Processors were active", + "MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_U= NHALTED.REF_DISTRIBUTED if #SMT_on else 0)", + "MetricGroup": "SMT", + "MetricName": "tma_info_system_smt_2t_utilization" + }, + { + "BriefDescription": "Socket actual clocks when any core is active = on that socket", + "MetricExpr": "UNC_CLOCK.SOCKET", + "MetricGroup": "SoC", + "MetricName": "tma_info_system_socket_clks" + }, + { + "BriefDescription": "Average Frequency Utilization relative nomina= l frequency", + "MetricExpr": "tma_info_thread_clks / CPU_CLK_UNHALTED.REF_TSC", + "MetricGroup": "Power", + "MetricName": "tma_info_system_turbo_utilization" + }, + { + "BriefDescription": "Per-Logical Processor actual clocks when the = Logical Processor is active.", + "MetricExpr": "CPU_CLK_UNHALTED.THREAD", + "MetricGroup": "Pipeline", + "MetricName": "tma_info_thread_clks" + }, + { + "BriefDescription": "Cycles Per Instruction (per Logical Processor= )", + "MetricExpr": "1 / tma_info_thread_ipc", + "MetricGroup": "Mem;Pipeline", + "MetricName": "tma_info_thread_cpi" + }, + { + "BriefDescription": "The ratio of Executed- by Issued-Uops", + "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY", + "MetricGroup": "Cor;Pipeline", + "MetricName": "tma_info_thread_execute_per_issue", + "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio= > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate o= f \"execute\" at rename stage." + }, + { + "BriefDescription": "Instructions Per Cycle (per Logical Processor= )", + "MetricExpr": "INST_RETIRED.ANY / tma_info_thread_clks", + "MetricGroup": "Ret;Summary", + "MetricName": "tma_info_thread_ipc" + }, + { + "BriefDescription": "Total issue-pipeline slots (per-Physical Core= till ICL; per-Logical Processor ICL onward)", + "MetricExpr": "TOPDOWN.SLOTS", + "MetricGroup": "TmaL1;tma_L1_group", + "MetricName": "tma_info_thread_slots" + }, + { + "BriefDescription": "Fraction of Physical Core issue-slots utilize= d by this Logical Processor", + "MetricExpr": "(tma_info_thread_slots / (TOPDOWN.SLOTS / 2) if #SM= T_on else 1)", + "MetricGroup": "SMT;TmaL1;tma_L1_group", + "MetricName": "tma_info_thread_slots_utilization" + }, + { + "BriefDescription": "Uops Per Instruction", + "MetricExpr": "tma_retiring * tma_info_thread_slots / INST_RETIRED= .ANY", + "MetricGroup": "Pipeline;Ret;Retire", + "MetricName": "tma_info_thread_uoppi", + "MetricThreshold": "tma_info_thread_uoppi > 1.05" + }, + { + "BriefDescription": "Instruction per taken branch", + "MetricExpr": "tma_retiring * tma_info_thread_slots / BR_INST_RETI= RED.NEAR_TAKEN", + "MetricGroup": "Branches;Fed;FetchBW", + "MetricName": "tma_info_thread_uptb", + "MetricThreshold": "tma_info_thread_uptb < 7.5" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Instruction TLB (ITLB) misses", + "MetricExpr": "ICACHE_64B.IFTAG_STALL / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;= tma_fetch_latency_group", + "MetricName": "tma_itlb_misses", + "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > = 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTE= ND_RETIRED.STLB_MISS_PS;FRONTEND_RETIRED.ITLB_MISS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled without loads missing the L1 data cache", + "MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY= .STALLS_L1D_MISS) / tma_info_thread_clks, 0)", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_issueL1;tma_issueMC;tma_memory_bound_group", + "MetricName": "tma_l1_bound", + "MetricThreshold": "tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 &= tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled without loads missing the L1 data cache. The L1 data cache typical= ly has the shortest latency. However; in certain cases like loads blocked = on older stores; a load might suffer due to high latency even though it is = being satisfied by the L1. Another example is loads who miss in the TLB. Th= ese cases are characterized by execution unit stalls; while some non-comple= ted demand load lives in the machine without having that demand load missin= g the L1 cache. Sample with: MEM_LOAD_RETIRED.L1_HIT_PS;MEM_LOAD_RETIRED.FB= _HIT_PS. Related metrics: tma_clears_resteers, tma_machine_clears, tma_micr= ocode_sequencer, tma_ms_switches, tma_ports_utilized_1", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled due to L2 cache accesses by loads", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_= HIT / MEM_LOAD_RETIRED.L1_MISS) / (MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_= RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + L1D_PEND_MISS.FB_FULL_PERIODS)= * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_= info_thread_clks)", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", + "MetricName": "tma_l2_bound", + "MetricThreshold": "tma_l2_bound > 0.05 & (tma_memory_bound > 0.2 = & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled due to L2 cache accesses by loads. Avoiding cache misses (i.e. L1 = misses/L2 hits) can improve the latency and increase performance. Sample wi= th: MEM_LOAD_RETIRED.L2_HIT_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often the CPU was s= talled due to loads accesses to L3 cache or contended with a sibling Core", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STA= LLS_L3_MISS) / tma_info_thread_clks", + "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_= group;tma_memory_bound_group", + "MetricName": "tma_l3_bound", + "MetricThreshold": "tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 = & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often the CPU was = stalled due to loads accesses to L3 cache or contended with a sibling Core.= Avoiding cache misses (i.e. L2 misses/L3 hits) can improve the latency an= d increase performance. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles wit= h demand load accesses that hit the L3 cache under unloaded scenarios (poss= ibly L3 latency limited)", + "MetricExpr": "9 * tma_info_system_average_frequency * MEM_LOAD_RE= TIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2)= / tma_info_thread_clks", + "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_= l3_bound_group", + "MetricName": "tma_l3_hit_latency", + "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.0= 5 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles wi= th demand load accesses that hit the L3 cache under unloaded scenarios (pos= sibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L= 3 hits) will improve the latency; reduce contention with sibling physical c= ores and increase performance. Note the value of this node may overlap wit= h its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: t= ma_info_bottleneck_memory_latency, tma_mem_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= was stalled due to Length Changing Prefixes (LCPs)", + "MetricExpr": "ILD_STALL.LCP / tma_info_thread_clks", + "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_= group;tma_issueFB", + "MetricName": "tma_lcp", + "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tm= a_frontend_bound > 0.15)", + "PublicDescription": "This metric represents fraction of cycles CP= U was stalled due to Length Changing Prefixes (LCPs). Using proper compiler= flags or Intel Compiler by default will certainly avoid this. #Link: Optim= ization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_= bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, t= ma_info_inst_mix_iptb", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring light-weight operations -- instructions that require= no more than one uop (micro-operation)", + "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)", + "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricName": "tma_light_operations", + "MetricThreshold": "tma_light_operations > 0.6", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Load operations", + "MetricExpr": "UOPS_DISPATCHED.PORT_2_3 / (2 * tma_info_core_core_= clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_load_op_utilization", + "MetricThreshold": "tma_load_op_utilization > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port for Load operations. Sample with: = UOPS_DISPATCHED.PORT_2_3", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the (first level) DTLB was missed by load accesses, that late= r on hit in second-level TLB (STLB)", + "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_gro= up", + "MetricName": "tma_load_stlb_hit", + "MetricThreshold": "tma_load_stlb_hit > 0.05 & (tma_dtlb_load > 0.= 1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2= )))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = where the Second-level TLB (STLB) was missed by load accesses, performing a= hardware page walk", + "MetricExpr": "DTLB_LOAD_MISSES.WALK_ACTIVE / tma_info_thread_clks= ", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_gro= up", + "MetricName": "tma_load_stlb_miss", + "MetricThreshold": "tma_load_stlb_miss > 0.05 & (tma_dtlb_load > 0= .1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.= 2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU spent handling cache misses due to lock operations", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS= .ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10= * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTAN= DING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks", + "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1= _bound_group", + "MetricName": "tma_lock_latency", + "MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound > 0.1 &= (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU spent handling cache misses due to lock operations. Due to the microa= rchitecture handling of locks; they are classified as L1_Bound regardless o= f what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOA= DS_PS. Related metrics: tma_store_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to LSD (Loop Stream Detector) unit", + "MetricExpr": "(LSD.CYCLES_ACTIVE - LSD.CYCLES_OK) / tma_info_core= _core_clks / 2", + "MetricGroup": "FetchBW;LSD;TopdownL3;tma_L3_group;tma_fetch_bandw= idth_group", + "MetricName": "tma_lsd", + "MetricThreshold": "tma_lsd > 0.15 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to LSD (Loop Stream Detector) unit. = LSD typically does well sustaining Uop supply. However; in some rare cases= ; optimal uop-delivery could not be reached for small loops whose size (in = terms of number of uops) does not suit well the LSD structure.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Machine Clears", + "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts= )", + "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", + "MetricName": "tma_machine_clears", + "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles wher= e the core's performance was likely hurt due to approaching bandwidth limit= s of external memory (DRAM)", + "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_O= UTSTANDING.ALL_DATA_RD\\,cmask\\=3D4@) / tma_info_thread_clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_b= ound_group;tma_issueBW", + "MetricName": "tma_mem_bandwidth", + "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.= 1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whe= re the core's performance was likely hurt due to approaching bandwidth limi= ts of external memory (DRAM). The underlying heuristic assumes that a simi= lar off-core traffic is generated by all IA cores. This metric does not agg= regate non-data-read requests by this logical processor; requests from othe= r IA Logical Processors/Physical Cores/sockets; or other non-IA devices lik= e GPU; hence the maximum external memory bandwidth limits may or may not be= approached when this metric is flagged (see Uncore counters for that). Rel= ated metrics: tma_fb_full, tma_info_bottleneck_memory_bandwidth, tma_info_s= ystem_dram_bw_use, tma_sq_full", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles wher= e the performance was likely hurt due to latency from external memory (DRAM= )", + "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTST= ANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth", + "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_= bound_group;tma_issueLat", + "MetricName": "tma_mem_latency", + "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles whe= re the performance was likely hurt due to latency from external memory (DRA= M). This metric does not aggregate requests from other Logical Processors/= Physical Cores/sockets (see Uncore counters for that). Related metrics: tma= _info_bottleneck_memory_latency, tma_l3_hit_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", + "MetricExpr": "(CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND= _ON_STORES) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + t= ma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES) * = tma_backend_bound", + "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", + "MetricName": "tma_memory_bound", + "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", + "MetricgroupNoGroup": "TopdownL2", + "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring memory operations -- uops for memory load or store a= ccesses.", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "tma_light_operations * MEM_INST_RETIRED.ANY / INST_= RETIRED.ANY", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_memory_operations", + "MetricThreshold": "tma_memory_operations > 0.1 & tma_light_operat= ions > 0.6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots the = CPU was retiring uops fetched by the Microcode Sequencer (MS) unit", + "MetricExpr": "tma_retiring * tma_info_thread_slots / UOPS_ISSUED.= ANY * IDQ.MS_UOPS / tma_info_thread_slots", + "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operatio= ns_group;tma_issueMC;tma_issueMS", + "MetricName": "tma_microcode_sequencer", + "MetricThreshold": "tma_microcode_sequencer > 0.05 & tma_heavy_ope= rations > 0.1", + "PublicDescription": "This metric represents fraction of slots the= CPU was retiring uops fetched by the Microcode Sequencer (MS) unit. The M= S is used for CISC instructions not supported by the default decoders (like= repeat move strings; or CPUID); or by microcode assists used to address so= me operation modes (like in Floating Point assists). These cases can often = be avoided. Sample with: IDQ.MS_UOPS. Related metrics: tma_clears_resteers,= tma_l1_bound, tma_machine_clears, tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to Branch Resteers as a result of Branch Misprediction= at execution stage", + "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL= _BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_inf= o_thread_clks", + "MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_L4_group;tma_b= ranch_resteers_group;tma_issueBM", + "MetricName": "tma_mispredicts_resteers", + "MetricThreshold": "tma_mispredicts_resteers > 0.05 & (tma_branch_= resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to Branch Resteers as a result of Branch Mispredictio= n at execution stage. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES. Related m= etrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost= , tma_info_bottleneck_mispredictions", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s in which CPU was likely limited due to the MITE pipeline (the legacy deco= de pipeline)", + "MetricExpr": "(IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK) / tma_in= fo_core_core_clks / 2", + "MetricGroup": "DSBmiss;FetchBW;TopdownL3;tma_L3_group;tma_fetch_b= andwidth_group", + "MetricName": "tma_mite", + "MetricThreshold": "tma_mite > 0.1 & (tma_fetch_bandwidth > 0.1 & = tma_frontend_bound > 0.15 & tma_info_thread_ipc / 5 > 0.35)", + "PublicDescription": "This metric represents Core fraction of cycl= es in which CPU was likely limited due to the MITE pipeline (the legacy dec= ode pipeline). This pipeline is used for code that was not pre-cached in th= e DSB or LSD. For example; inefficiencies due to asymmetric decoders; use o= f long immediate or LCP can manifest as MITE fetch bandwidth bottleneck. Sa= mple with: FRONTEND_RETIRED.ANY_DSB_MISS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re (only) 4 uops were delivered by the MITE pipeline", + "MetricExpr": "(cpu@IDQ.MITE_UOPS\\,cmask\\=3D4@ - cpu@IDQ.MITE_UO= PS\\,cmask\\=3D5@) / tma_info_thread_clks", + "MetricGroup": "DSBmiss;FetchBW;TopdownL4;tma_L4_group;tma_mite_gr= oup", + "MetricName": "tma_mite_4wide", + "MetricThreshold": "tma_mite_4wide > 0.05 & (tma_mite > 0.1 & (tma= _fetch_bandwidth > 0.1 & tma_frontend_bound > 0.15 & tma_info_thread_ipc / = 5 > 0.35))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "The Mixing_Vectors metric gives the percentag= e of injected blend uops out of all uops issued", + "MetricExpr": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH / UOPS_ISSUED.ANY= ", + "MetricGroup": "TopdownL5;tma_L5_group;tma_issueMV;tma_ports_utili= zed_0_group", + "MetricName": "tma_mixing_vectors", + "MetricThreshold": "tma_mixing_vectors > 0.05", + "PublicDescription": "The Mixing_Vectors metric gives the percenta= ge of injected blend uops out of all uops issued. Usually a Mixing_Vectors = over 5% is worth investigating. Read more in Appendix B1 of the Optimizatio= ns Guide for this topic. Related metrics: tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = when the CPU was stalled due to switches of uop delivery to the Microcode S= equencer (MS)", + "MetricExpr": "3 * IDQ.MS_SWITCHES / tma_info_thread_clks", + "MetricGroup": "FetchLat;MicroSeq;TopdownL3;tma_L3_group;tma_fetch= _latency_group;tma_issueMC;tma_issueMS;tma_issueMV;tma_issueSO", + "MetricName": "tma_ms_switches", + "MetricThreshold": "tma_ms_switches > 0.05 & (tma_fetch_latency > = 0.1 & tma_frontend_bound > 0.15)", + "PublicDescription": "This metric estimates the fraction of cycles= when the CPU was stalled due to switches of uop delivery to the Microcode = Sequencer (MS). Commonly used instructions are optimized for delivery by th= e DSB (decoded i-cache) or MITE (legacy instruction decode) pipelines. Cert= ain operations cannot be handled natively by the execution pipeline; and mu= st be performed by microcode (small programs injected into the execution st= ream). Switching to the MS too often can negatively impact performance. The= MS is designated to deliver long uop flows required by CISC instructions l= ike CPUID; or uncommon conditions like Floating Point Assists when dealing = with Denormals. Sample with: IDQ.MS_SWITCHES. Related metrics: tma_clears_r= esteers, tma_l1_bound, tma_machine_clears, tma_microcode_sequencer, tma_mix= ing_vectors, tma_serializing_operation", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring NOP (no op) instructions", + "MetricExpr": "tma_light_operations * INST_RETIRED.NOP / (tma_reti= ring * tma_info_thread_slots)", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_nop_instructions", + "MetricThreshold": "tma_nop_instructions > 0.1 & tma_light_operati= ons > 0.6", + "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring NOP (no op) instructions. Compilers often use NOPs = for certain address alignments - e.g. start address of a function or loop b= ody. Sample with: INST_RETIRED.NOP", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents the remaining light uo= ps fraction the CPU has executed - remaining means not covered by other sib= ling nodes", + "MetricConstraint": "NO_GROUP_EVENTS", + "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_m= emory_operations + tma_branch_instructions + tma_nop_instructions))", + "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operatio= ns_group", + "MetricName": "tma_other_light_ops", + "MetricThreshold": "tma_other_light_ops > 0.3 & tma_light_operatio= ns > 0.6", + "PublicDescription": "This metric represents the remaining light u= ops fraction the CPU has executed - remaining means not covered by other si= bling nodes. May undercount due to FMA double counting", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd b= ranch)", + "MetricExpr": "UOPS_DISPATCHED.PORT_0 / tma_info_core_core_clks", + "MetricGroup": "Compute;TopdownL6;tma_L6_group;tma_alu_op_utilizat= ion_group;tma_issue2P", + "MetricName": "tma_port_0", + "MetricThreshold": "tma_port_0 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd = branch). Sample with: UOPS_DISPATCHED.PORT_0. Related metrics: tma_fp_scala= r, tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512= b, tma_port_1, tma_port_5, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 1 (ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_1 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_1", + "MetricThreshold": "tma_port_1 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 1 (ALU). Sample with: UOPS_DISPATC= HED.PORT_1. Related metrics: tma_fp_scalar, tma_fp_vector, tma_fp_vector_12= 8b, tma_fp_vector_256b, tma_fp_vector_512b, tma_port_0, tma_port_5, tma_por= t_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 5 ([SNB+] Branches and ALU; [HSW+] = ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_5 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_5", + "MetricThreshold": "tma_port_5 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 5 ([SNB+] Branches and ALU; [HSW+]= ALU). Sample with: UOPS_DISPATCHED.PORT_5. Related metrics: tma_fp_scalar,= tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b,= tma_port_0, tma_port_1, tma_port_6, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple = ALU)", + "MetricExpr": "UOPS_DISPATCHED.PORT_6 / tma_info_core_core_clks", + "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_grou= p;tma_issue2P", + "MetricName": "tma_port_6", + "MetricThreshold": "tma_port_6 > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple= ALU). Sample with: UOPS_DISPATCHED.PORT_6. Related metrics: tma_fp_scalar,= tma_fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b,= tma_port_0, tma_port_1, tma_port_5, tma_ports_utilized_2", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU performance was potentially limited due to Core computation issues (non= divider-related)", + "MetricExpr": "((cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x80@ += tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.= STALLS_MEM_ANY) + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.= 2_PORTS_UTIL)) / tma_info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIV= ITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS= _UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) / tma_info_thread_clks)", + "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_gr= oup", + "MetricName": "tma_ports_utilization", + "MetricThreshold": "tma_ports_utilization > 0.15 & (tma_core_bound= > 0.1 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates fraction of cycles the= CPU performance was potentially limited due to Core computation issues (no= n divider-related). Two distinct categories can be attributed into this me= tric: (1) heavy data-dependency among contiguous instructions would manifes= t in this metric - such cases are often referred to as low Instruction Leve= l Parallelism (ILP). (2) Contention on some hardware execution unit other t= han Divider. For example; when there are too many multiply operations.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed no uops on any execution port (Logical Processor cycles since ICL= , Physical Core cycles otherwise)", + "MetricExpr": "cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=3D0x80@ / t= ma_info_thread_clks + tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TO= TAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", + "MetricName": "tma_ports_utilized_0", + "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed no uops on any execution port (Logical Processor cycles since IC= L, Physical Core cycles otherwise). Long-latency instructions like divides = may contribute to this metric.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles whe= re the CPU executed total of 1 uop per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_p= orts_utilization_group", + "MetricName": "tma_ports_utilized_1", + "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utiliz= ation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles wh= ere the CPU executed total of 1 uop per cycle on all execution ports (Logic= al Processor cycles since ICL, Physical Core cycles otherwise). This can be= due to heavy data-dependency among software instructions; or over oversubs= cribing a particular hardware resource. In some other cases with high 1_Por= t_Utilized and L1_Bound; this metric can point to L1 data-cache latency bot= tleneck that may not necessarily manifest with complete execution starvatio= n (due to the short L1 latency e.g. walking a linked list) - looking at the= assembly can be helpful. Sample with: EXE_ACTIVITY.1_PORTS_UTIL. Related m= etrics: tma_l1_bound", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 2 uops per cycle on all execution ports (Logical Process= or cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_p= orts_utilization_group", + "MetricName": "tma_ports_utilized_2", + "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed total of 2 uops per cycle on all execution ports (Logical Proces= sor cycles since ICL, Physical Core cycles otherwise). Loop Vectorization = -most compilers feature auto-Vectorization options today- reduces pressure = on the execution ports as multiple elements are calculated with same uop. S= ample with: EXE_ACTIVITY.2_PORTS_UTIL. Related metrics: tma_fp_scalar, tma_= fp_vector, tma_fp_vector_128b, tma_fp_vector_256b, tma_fp_vector_512b, tma_= port_0, tma_port_1, tma_port_5, tma_port_6", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles CPU= executed total of 3 or more uops per cycle on all execution ports (Logical= Processor cycles since ICL, Physical Core cycles otherwise)", + "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utiliza= tion_group", + "MetricName": "tma_ports_utilized_3m", + "MetricThreshold": "tma_ports_utilized_3m > 0.7 & (tma_ports_utili= zation > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents fraction of cycles CP= U executed total of 3 or more uops per cycle on all execution ports (Logica= l Processor cycles since ICL, Physical Core cycles otherwise). Sample with:= UOPS_EXECUTED.CYCLES_GE_3", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", + "MetricName": "tma_retiring", + "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", + "MetricgroupNoGroup": "TopdownL1;Default", + "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU issue-pipeline was stalled due to serializing operations", + "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks", + "MetricGroup": "PortsUtil;TopdownL5;tma_L5_group;tma_issueSO;tma_p= orts_utilized_0_group", + "MetricName": "tma_serializing_operation", + "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_ports_u= tilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & t= ma_backend_bound > 0.2)))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU issue-pipeline was stalled due to serializing operations. Instruction= s like CPUID; WRMSR or LFENCE serialize the out-of-order execution which ma= y limit performance. Sample with: RESOURCE_STALLS.SCOREBOARD. Related metri= cs: tma_ms_switches", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to PAUSE Instructions", + "MetricExpr": "140 * MISC_RETIRED.PAUSE_INST / tma_info_thread_clk= s", + "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_g= roup", + "MetricName": "tma_slow_pause", + "MetricThreshold": "tma_slow_pause > 0.05 & (tma_serializing_opera= tion > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & = (tma_core_bound > 0.1 & tma_backend_bound > 0.2))))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to PAUSE Instructions. Sample with: MISC_RETIRED.PAUS= E_INST", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles hand= ling memory load split accesses - load that cross 64-byte cache line bounda= ry", + "MetricExpr": "tma_info_memory_load_miss_real_latency * LD_BLOCKS.= NO_SR / tma_info_thread_clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_split_loads", + "MetricThreshold": "tma_split_loads > 0.2 & (tma_l1_bound > 0.1 & = (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles han= dling memory load split accesses - load that cross 64-byte cache line bound= ary. Sample with: MEM_INST_RETIRED.SPLIT_LOADS_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents rate of split store ac= cesses", + "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_= clks", + "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bou= nd_group", + "MetricName": "tma_split_stores", + "MetricThreshold": "tma_split_stores > 0.2 & (tma_store_bound > 0.= 2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric represents rate of split store a= ccesses. Consider aligning your data to the 64-byte cache line granularity= . Sample with: MEM_INST_RETIRED.SPLIT_STORES_PS. Related metrics: tma_port_= 4", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric measures fraction of cycles where= the Super Queue (SQ) was full taking into account all request-types and bo= th hardware SMT threads (Logical Processors)", + "MetricExpr": "L1D_PEND_MISS.L2_STALL / tma_info_thread_clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueB= W;tma_l3_bound_group", + "MetricName": "tma_sq_full", + "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tm= a_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric measures fraction of cycles wher= e the Super Queue (SQ) was full taking into account all request-types and b= oth hardware SMT threads (Logical Processors). Related metrics: tma_fb_full= , tma_info_bottleneck_memory_bandwidth, tma_info_system_dram_bw_use, tma_me= m_bandwidth", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often CPU was stall= ed due to RFO store memory accesses; RFO store issue a read-for-ownership = request before the write", + "MetricExpr": "EXE_ACTIVITY.BOUND_ON_STORES / tma_info_thread_clks= ", + "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_me= mory_bound_group", + "MetricName": "tma_store_bound", + "MetricThreshold": "tma_store_bound > 0.2 & (tma_memory_bound > 0.= 2 & tma_backend_bound > 0.2)", + "PublicDescription": "This metric estimates how often CPU was stal= led due to RFO store memory accesses; RFO store issue a read-for-ownership= request before the write. Even though store accesses do not typically stal= l out-of-order CPUs; there are few cases where stores can lead to actual st= alls. This metric will be flagged should RFO stores be a bottleneck. Sample= with: MEM_INST_RETIRED.ALL_STORES_PS", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates fraction of cyc= les when the memory subsystem had loads blocked since they could not forwar= d data from earlier (in program order) overlapping stores", + "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks= ", + "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group", + "MetricName": "tma_store_fwd_blk", + "MetricThreshold": "tma_store_fwd_blk > 0.1 & (tma_l1_bound > 0.1 = & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric roughly estimates fraction of cy= cles when the memory subsystem had loads blocked since they could not forwa= rd data from earlier (in program order) overlapping stores. To streamline m= emory operations in the pipeline; a load can avoid waiting for memory if a = prior in-flight store is writing the data that the load wants to read (stor= e forwarding process). However; in some cases the load may be blocked for a= significant time pending the store forward. For example; when the prior st= ore is writing a smaller region than the load is reading.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates fraction of cycles the = CPU spent handling L1D store misses", + "MetricExpr": "(L2_RQSTS.RFO_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK= _LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_LOADS / = MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUEST= S_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / tma_info_thread_clks", + "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issue= RFO;tma_issueSL;tma_store_bound_group", + "MetricName": "tma_store_latency", + "MetricThreshold": "tma_store_latency > 0.1 & (tma_store_bound > 0= .2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates fraction of cycles the= CPU spent handling L1D store misses. Store accesses usually less impact ou= t-of-order core performance; however; holding resources for longer time can= lead into undesired implications (e.g. contention on L1D fill-buffer entri= es - see FB_Full). Related metrics: tma_fb_full, tma_lock_latency", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents Core fraction of cycle= s CPU dispatched uops on execution port for Store operations", + "MetricExpr": "(UOPS_DISPATCHED.PORT_4_9 + UOPS_DISPATCHED.PORT_7_= 8) / (4 * tma_info_core_core_clks)", + "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group= ", + "MetricName": "tma_store_op_utilization", + "MetricThreshold": "tma_store_op_utilization > 0.6", + "PublicDescription": "This metric represents Core fraction of cycl= es CPU dispatched uops on execution port for Store operations. Sample with:= UOPS_DISPATCHED.PORT_7_8", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric roughly estimates the fraction of= cycles where the TLB was missed by store accesses, hitting in the second-l= evel TLB (STLB)", + "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_gr= oup", + "MetricName": "tma_store_stlb_hit", + "MetricThreshold": "tma_store_stlb_hit > 0.05 & (tma_dtlb_store > = 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound= > 0.2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates the fraction of cycles = where the STLB was missed by store accesses, performing a hardware page wal= k", + "MetricExpr": "DTLB_STORE_MISSES.WALK_ACTIVE / tma_info_core_core_= clks", + "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_gr= oup", + "MetricName": "tma_store_stlb_miss", + "MetricThreshold": "tma_store_stlb_miss > 0.05 & (tma_dtlb_store >= 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_boun= d > 0.2)))", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric estimates how often CPU was stall= ed due to Streaming store memory accesses; Streaming store optimize out a = read request required by RFO stores", + "MetricExpr": "9 * OCR.STREAMING_WR.ANY_RESPONSE / tma_info_thread= _clks", + "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueS= mSt;tma_store_bound_group", + "MetricName": "tma_streaming_stores", + "MetricThreshold": "tma_streaming_stores > 0.2 & (tma_store_bound = > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", + "PublicDescription": "This metric estimates how often CPU was stal= led due to Streaming store memory accesses; Streaming store optimize out a= read request required by RFO stores. Even though store accesses do not typ= ically stall out-of-order CPUs; there are few cases where stores can lead t= o actual stalls. This metric will be flagged should Streaming stores be a b= ottleneck. Sample with: OCR.STREAMING_WR.ANY_RESPONSE. Related metrics: tma= _fb_full", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric represents fraction of cycles the= CPU was stalled due to new branch address clears", + "MetricExpr": "10 * BACLEARS.ANY / tma_info_thread_clks", + "MetricGroup": "BigFoot;FetchLat;TopdownL4;tma_L4_group;tma_branch= _resteers_group", + "MetricName": "tma_unknown_branches", + "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_rest= eers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", + "PublicDescription": "This metric represents fraction of cycles th= e CPU was stalled due to new branch address clears. These are fetched branc= hes the Branch Prediction Unit was unable to recognize (e.g. first time the= branch is fetched or hitting BTB capacity limit). Sample with: BACLEARS.AN= Y", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "This metric serves as an approximation of leg= acy x87 usage", + "MetricExpr": "tma_retiring * UOPS_EXECUTED.X87 / UOPS_EXECUTED.TH= READ", + "MetricGroup": "Compute;TopdownL4;tma_L4_group;tma_fp_arith_group", + "MetricName": "tma_x87_use", + "MetricThreshold": "tma_x87_use > 0.1 & (tma_fp_arith > 0.2 & tma_= light_operations > 0.6)", + "PublicDescription": "This metric serves as an approximation of le= gacy x87 usage. It accounts for instructions beyond X87 FP arithmetic opera= tions; hence may be used as a thermometer to avoid X87 high usage and prefe= rably upgrade to modern ISA. See Tip under Tuning Hint.", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Percentage of cycles in aborted transactions.= ", + "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_e= vent(cycles\\-t) else 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_aborted_cycles", + "ScaleUnit": "100%" + }, + { + "BriefDescription": "Number of cycles within a transaction divided= by the number of elisions.", + "MetricExpr": "(cycles\\-t / el\\-start if has_event(cycles\\-t) e= lse 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_cycles_per_elision", + "ScaleUnit": "1cycles / elision" + }, + { + "BriefDescription": "Number of cycles within a transaction divided= by the number of transactions.", + "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) e= lse 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_cycles_per_transaction", + "ScaleUnit": "1cycles / transaction" + }, + { + "BriefDescription": "Percentage of cycles within a transaction reg= ion.", + "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else = 0)", + "MetricGroup": "transaction", + "MetricName": "tsx_transactional_cycles", + "ScaleUnit": "100%" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.json new file mode 100644 index 000000000000..8027590f1776 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.json @@ -0,0 +1,74 @@ +[ + { + "BriefDescription": "Number of entries allocated. Account for Any = type: e.g. Snoop, etc.", + "EventCode": "0x84", + "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of any coherent requ= est at memory controller that were issued by any core. This event is not su= pported on ICL products but is supported on RKL products.", + "EventCode": "0x85", + "EventName": "UNC_ARB_DAT_OCCUPANCY.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of coherent reads pe= nding on data return from memory controller that were issued by any core. T= his event is not supported on ICL products but is supported on RKL products= .", + "EventCode": "0x85", + "EventName": "UNC_ARB_DAT_OCCUPANCY.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle count number of 'valid' coherent D= ata Read entries . Such entry is defined as valid when it is allocated till= deallocation. Doesn't include prefetches. This event is not supported on I= CL products but is supported on RKL products.", + "EventCode": "0x80", + "EventName": "UNC_ARB_REQ_TRK_OCCUPANCY.DRD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Number of all coherent Data Read entries. Doe= sn't include prefetches", + "EventCode": "0x81", + "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle counts number of all outgoing vali= d entries in ReqTrk. Such entry is defined as valid from its allocation in = ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic. T= his event is not supported on ICL products but is supported on RKL products= .", + "EventCode": "0x80", + "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Each cycle count number of 'valid' coherent D= ata Read entries . Such entry is defined as valid when it is allocated till= deallocation. Doesn't include prefetches. This event is not supported on I= CL products but is supported on RKL products.", + "EventCode": "0x80", + "EventName": "UNC_ARB_TRK_OCCUPANCY.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + }, + { + "BriefDescription": "Total number of all outgoing entries allocate= d. Accounts for Coherent and non-coherent traffic.", + "EventCode": "0x81", + "EventName": "UNC_ARB_TRK_REQUESTS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "ARB" + }, + { + "BriefDescription": "Number of all coherent Data Read entries. Doe= sn't include prefetches. This event is not supported on ICL products but is= supported on RKL products.", + "EventCode": "0x81", + "EventName": "UNC_ARB_TRK_REQUESTS.RD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "ARB" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json b/= tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json new file mode 100644 index 000000000000..c6596ba09195 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json @@ -0,0 +1,9 @@ +[ + { + "BriefDescription": "UNC_CLOCK.SOCKET", + "EventCode": "0xff", + "EventName": "UNC_CLOCK.SOCKET", + "PerPkg": "1", + "Unit": "CLOCK" + } +] diff --git a/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json = b/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json new file mode 100644 index 000000000000..b28f62ce1f39 --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json @@ -0,0 +1,165 @@ +[ + { + "BriefDescription": "Loads that miss the DTLB and hit the STLB.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.STLB_HIT", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a demand load.", + "CounterMask": "1", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Load miss in all TLB levels causes a page wal= k that completes. (All page sizes)", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data loads. This implies it missed in the DTLB and furth= er levels of TLB. The page walk can end with or without a fault.", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 2M/4M page.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 4K page.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for a demand= load in the PMH each cycle.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Stores that miss the DTLB and hit the STLB.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.STLB_HIT", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a store.", + "CounterMask": "1", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Store misses in all TLB levels causes a page = walk that completes. (All page sizes)", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes= ) caused by demand data stores. This implies it missed in the DTLB and furt= her levels of TLB. The page walk can end with or without a fault.", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 2M/4M page.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 4K page.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for a store = in the PMH each cycle.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Instruction fetch requests that miss the ITLB= and hit the STLB.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.STLB_HIT", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for code (instruction fetch) request.", + "CounterMask": "1", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (All page sizes)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED", + "PublicDescription": "Counts completed page walks (all page sizes)= caused by a code fetch. This implies it missed in the ITLB (Instruction TL= B) and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0xe" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (2M/4M)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", + "SampleAfterValue": "100003", + "UMask": "0x4" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (4K)", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", + "SampleAfterValue": "100003", + "UMask": "0x2" + }, + { + "BriefDescription": "Number of page walks outstanding for an outst= anding code request in the PMH each cycle.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10" + }, + { + "BriefDescription": "DTLB flush attempts of the thread-specific en= tries", + "EventCode": "0xBD", + "EventName": "TLB_FLUSH.DTLB_THREAD", + "PublicDescription": "Counts the number of DTLB flush attempts of = the thread-specific entries.", + "SampleAfterValue": "100007", + "UMask": "0x1" + }, + { + "BriefDescription": "STLB flush attempts", + "EventCode": "0xBD", + "EventName": "TLB_FLUSH.STLB_ANY", + "PublicDescription": "Counts the number of any STLB flush attempts= (such as entire, VPID, PCID, InvPage, CR3 write, etc.).", + "SampleAfterValue": "100007", + "UMask": "0x20" + } +] --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38959EB64D7 for ; Fri, 23 Jun 2023 15:12:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231817AbjFWPML (ORCPT ); Fri, 23 Jun 2023 11:12:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232527AbjFWPLX (ORCPT ); Fri, 23 Jun 2023 11:11:23 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D5BE26BB for ; Fri, 23 Jun 2023 08:10:55 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-57003dac4a8so41996077b3.1 for ; Fri, 23 Jun 2023 08:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533046; x=1690125046; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=3RBPOX5aDFxatVAvB4bTcMTaL+sPK3GopLRmJy6qgEI=; b=7evDVQxYmyIINQ0KFboHXTGJD5ZSR7yUmDOMxh0I3nRIg9Fvdvx8qRYJIRdZoCK/g9 nlSDgaAgelWZ6Cy+d4e02nj+x6Uned6UY4drwG8B63uKenGDRDaMf9msOAEStcf1FCSR B3djPgFCA9f5cdXjwaYvbApAIceKxYfrxw8AFO+XB19HY0cWclG6z/TfJyWN2vfgilye TUeptJBa96DTUsP4JH9I/yW491VafrOvqC6tqyS6GT7a7gclLmSh7FnF/qMatuZxg/w4 KNaVUr6b+72AjOagL9dia5u0MbyEh/QIXGFuj9gPwOU2lp1gMc87/vCeD7czCgEpwAG1 Vaeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533046; x=1690125046; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=3RBPOX5aDFxatVAvB4bTcMTaL+sPK3GopLRmJy6qgEI=; b=Xk6oTprT/uTeaUBeWfOK/BpWdeiSxsE5mvNzQZCN94jjbFrxJN2j0ch4a5FHMeQN7z c15Qgpcnf1TenwVNYD83+eIDysjusyFD0BQQhrpdnqedIibkyus451dAQbGP5Qw+C6si BsIjr8qsJaT5x2vFoK4SVp5juTK9Ev/jc9YnMykz9bmyqJL42VjV/rgiYrBFU81x9UqJ b4PbPCJwP1Fg5F+4tysEQhT/BOzDTSBlIvDA10ONVzdI3KVQ20ewkgm5KePn9M40UTAG xIHIHOC2sxhtYBkkE6YEH0lefCfMiY3/hCN9YdvnvOFtPbulaoJkw2xr420JYUYi/DBF LGng== X-Gm-Message-State: AC+VfDyq/uWqm4SifxnlsZBMSc+h76alu/4Xosq4k6WDJ7LBCL0xqjUt BDoqiDQ/1ruez9yejugE90R74TjDDYWT X-Google-Smtp-Source: ACHHUZ6Dlhxv/s2gDRevzilAgb8ZqrGL91LppR9MCxBhWs+q0jNaffWCIpVntdjdvTl56wV5jaaSKqOFy3hm X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a05:6902:561:b0:bad:2b06:da3 with SMTP id a1-20020a056902056100b00bad2b060da3mr9408362ybt.3.1687533046319; Fri, 23 Jun 2023 08:10:46 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:09 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-6-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 05/12] perf vendor events intel: Update meteorlake to 1.03 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" 1.03 events were released in: https://github.com/intel/perfmon/commit/501a29e88b57e8b01d610168d0101d6181b= 15e28 It added a lot of events and all uncore events. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../pmu-events/arch/x86/meteorlake/cache.json | 811 +++++++++++ .../arch/x86/meteorlake/floating-point.json | 143 ++ .../arch/x86/meteorlake/frontend.json | 410 ++++++ .../arch/x86/meteorlake/memory.json | 142 +- .../pmu-events/arch/x86/meteorlake/other.json | 57 +- .../arch/x86/meteorlake/pipeline.json | 1211 ++++++++++++++++- .../arch/x86/meteorlake/uncore-cache.json | 18 + .../x86/meteorlake/uncore-interconnect.json | 42 + .../arch/x86/meteorlake/uncore-memory.json | 126 ++ .../arch/x86/meteorlake/virtual-memory.json | 257 ++++ 11 files changed, 3116 insertions(+), 103 deletions(-) create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/floating-poin= t.json create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/uncore-cache.= json create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/uncore-interc= onnect.json create mode 100644 tools/perf/pmu-events/arch/x86/meteorlake/uncore-memory= .json diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index a30f2483d99e..de4832bddf05 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -19,7 +19,7 @@ GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core GenuineIntel-6-2D,v23,jaketown,core GenuineIntel-6-(57|85),v10,knightslanding,core -GenuineIntel-6-A[AC],v1.01,meteorlake,core +GenuineIntel-6-A[AC],v1.03,meteorlake,core GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core GenuineIntel-6-A7,v1.01,rocketlake,core diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/cache.json index bf24d3f25a3d..e1ae7c92f38e 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json @@ -1,4 +1,114 @@ [ + { + "BriefDescription": "L1D.HWPF_MISS", + "EventCode": "0x51", + "EventName": "L1D.HWPF_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of cache lines replaced in = L1 data cache.", + "EventCode": "0x51", + "EventName": "L1D.REPLACEMENT", + "PublicDescription": "Counts L1D data line replacements including = opportunistic replacements, and replacements that require stall-for-replace= or block-for-replace.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of cycles a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL", + "PublicDescription": "Counts number of cycles a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of phases a demand request has waited = due to L1D Fill Buffer (FB) unavailability.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS", + "PublicDescription": "Counts number of phases a demand request has= waited due to L1D Fill Buffer (FB) unavailability. Demand requests include= cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of L1D misses that are outstanding", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING", + "PublicDescription": "Counts number of L1D misses that are outstan= ding in each cycle, that is each cycle the number of Fill Buffers (FB) outs= tanding required by Demand Reads. FB either is held by demand loads, or it = is held by non-demand loads and gets hit at least once by demand. The valid= outstanding interval is defined until the FB deallocation by one of the fo= llowing ways: from FB allocation, if FB is allocated by demand from the dem= and Hit FB, if it is allocated by hardware or software prefetch. Note: In t= he L1D, a Demand Read contains cacheable or noncacheable demand loads, incl= uding ones causing cache-line splits and reads due to page walks resulted f= rom any request type.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles with L1D load Misses outstanding.", + "CounterMask": "1", + "EventCode": "0x48", + "EventName": "L1D_PEND_MISS.PENDING_CYCLES", + "PublicDescription": "Counts duration of L1D miss outstanding in c= ycles.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2 cache lines filling L2", + "EventCode": "0x25", + "EventName": "L2_LINES_IN.ALL", + "PublicDescription": "Counts the number of L2 cache lines filling = the L2. Counting does not cover rejects.", + "SampleAfterValue": "100003", + "UMask": "0x1f", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Modified cache lines that are evicted by L2 c= ache when triggered by an L2 cache fill.", + "EventCode": "0x26", + "EventName": "L2_LINES_OUT.NON_SILENT", + "PublicDescription": "Counts the number of lines that are evicted = by L2 cache when triggered by an L2 cache fill. Those lines are in Modified= state. Modified lines are written back to L3", + "SampleAfterValue": "200003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Non-modified cache lines that are silently dr= opped by L2 cache when triggered by an L2 cache fill.", + "EventCode": "0x26", + "EventName": "L2_LINES_OUT.SILENT", + "PublicDescription": "Counts the number of lines that are silently= dropped by L2 cache when triggered by an L2 cache fill. These lines are ty= pically in Shared or Exclusive state. A non-threaded event.", + "SampleAfterValue": "200003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "All accesses to L2 cache [This event is alias= to L2_RQSTS.REFERENCES]", + "EventCode": "0x24", + "EventName": "L2_REQUEST.ALL", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_RQSTS.REFERENCES]", + "SampleAfterValue": "200003", + "UMask": "0xff", + "Unit": "cpu_core" + }, + { + "BriefDescription": "All requests that hit L2 cache. [This event i= s alias to L2_RQSTS.HIT]", + "EventCode": "0x24", + "EventName": "L2_REQUEST.HIT", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_RQSTS.HIT]", + "SampleAfterValue": "200003", + "UMask": "0xdf", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Read requests with true-miss in L2 cache [Thi= s event is alias to L2_RQSTS.MISS]", + "EventCode": "0x24", + "EventName": "L2_REQUEST.MISS", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_RQSTS.MISS]", + "SampleAfterValue": "200003", + "UMask": "0x3f", + "Unit": "cpu_core" + }, { "BriefDescription": "L2 code requests", "EventCode": "0x24", @@ -17,6 +127,139 @@ "UMask": "0xe1", "Unit": "cpu_core" }, + { + "BriefDescription": "Demand requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_MISS", + "PublicDescription": "Counts demand requests that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x27", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", + "PublicDescription": "Counts demand requests to L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xe7", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2_RQSTS.ALL_HWPF", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_HWPF", + "SampleAfterValue": "200003", + "UMask": "0xf0", + "Unit": "cpu_core" + }, + { + "BriefDescription": "RFO requests to L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.ALL_RFO", + "PublicDescription": "Counts the total number of RFO (read for own= ership) requests to L2 cache. L2 RFO requests include both L1D demand RFO m= isses as well as L1D RFO prefetches.", + "SampleAfterValue": "200003", + "UMask": "0xe2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2 cache hits when fetching instructions, cod= e reads.", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_HIT", + "PublicDescription": "Counts L2 cache hits when fetching instructi= ons, code reads.", + "SampleAfterValue": "200003", + "UMask": "0xc4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2 cache misses when fetching instructions", + "EventCode": "0x24", + "EventName": "L2_RQSTS.CODE_RD_MISS", + "PublicDescription": "Counts L2 cache misses when fetching instruc= tions.", + "SampleAfterValue": "200003", + "UMask": "0x24", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand Data Read requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", + "PublicDescription": "Counts the number of demand Data Read reques= ts initiated by load instructions that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand Data Read miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", + "PublicDescription": "Counts demand Data Read requests with true-m= iss in the L2 cache. True-miss excludes misses that were merged with ongoin= g L2 misses. An access is counted once.", + "SampleAfterValue": "200003", + "UMask": "0x21", + "Unit": "cpu_core" + }, + { + "BriefDescription": "All requests that hit L2 cache. [This event i= s alias to L2_REQUEST.HIT]", + "EventCode": "0x24", + "EventName": "L2_RQSTS.HIT", + "PublicDescription": "Counts all requests that hit L2 cache. [This= event is alias to L2_REQUEST.HIT]", + "SampleAfterValue": "200003", + "UMask": "0xdf", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2_RQSTS.HWPF_MISS", + "EventCode": "0x24", + "EventName": "L2_RQSTS.HWPF_MISS", + "SampleAfterValue": "200003", + "UMask": "0x30", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Read requests with true-miss in L2 cache [Thi= s event is alias to L2_REQUEST.MISS]", + "EventCode": "0x24", + "EventName": "L2_RQSTS.MISS", + "PublicDescription": "Counts read requests of any type with true-m= iss in the L2 cache. True-miss excludes L2 misses that were merged with ong= oing L2 misses. [This event is alias to L2_REQUEST.MISS]", + "SampleAfterValue": "200003", + "UMask": "0x3f", + "Unit": "cpu_core" + }, + { + "BriefDescription": "All accesses to L2 cache [This event is alias= to L2_REQUEST.ALL]", + "EventCode": "0x24", + "EventName": "L2_RQSTS.REFERENCES", + "PublicDescription": "Counts all requests that were hit or true mi= sses in L2 cache. True-miss excludes misses that were merged with ongoing L= 2 misses. [This event is alias to L2_REQUEST.ALL]", + "SampleAfterValue": "200003", + "UMask": "0xff", + "Unit": "cpu_core" + }, + { + "BriefDescription": "RFO requests that hit L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_HIT", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that hit L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0xc2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "RFO requests that miss L2 cache", + "EventCode": "0x24", + "EventName": "L2_RQSTS.RFO_MISS", + "PublicDescription": "Counts the RFO (Read-for-Ownership) requests= that miss L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x22", + "Unit": "cpu_core" + }, + { + "BriefDescription": "L2 writebacks that access L2 cache", + "EventCode": "0x23", + "EventName": "L2_TRANS.L2_WB", + "PublicDescription": "Counts L2 writebacks that access L2 cache.", + "SampleAfterValue": "200003", + "UMask": "0x40", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of cacheable memory request= s that miss in the LLC. Counts on a per core basis.", "EventCode": "0x2e", @@ -53,6 +296,72 @@ "UMask": "0x4f", "Unit": "cpu_core" }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to an instruction cache or TLB miss.", + "EventCode": "0x35", + "EventName": "MEM_BOUND_STALLS_IFETCH.ALL", + "SampleAfterValue": "1000003", + "UMask": "0x6f", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles the core is stall= ed due to an instruction cache or TLB miss which hit in the L2 cache.", + "EventCode": "0x35", + "EventName": "MEM_BOUND_STALLS_IFETCH.L2_HIT", + "PublicDescription": "Counts the number of cycles the core is stal= led due to an instruction cache or Translation Lookaside Buffer (TLB) miss = which hit in the L2 cache.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to an icache or itlb miss which hit in the LLC.", + "EventCode": "0x35", + "EventName": "MEM_BOUND_STALLS_IFETCH.LLC_HIT", + "SampleAfterValue": "1000003", + "UMask": "0x6", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to an icache or itlb miss which missed all the caches.= ", + "EventCode": "0x35", + "EventName": "MEM_BOUND_STALLS_IFETCH.LLC_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x68", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to an L1 demand load miss.", + "EventCode": "0x34", + "EventName": "MEM_BOUND_STALLS_LOAD.ALL", + "SampleAfterValue": "1000003", + "UMask": "0x6f", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles the core is stall= ed due to a demand load which hit in the L2 cache.", + "EventCode": "0x34", + "EventName": "MEM_BOUND_STALLS_LOAD.L2_HIT", + "PublicDescription": "Counts the number of cycles a core is stalle= d due to a demand load which hit in the L2 cache.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to a demand load miss which hit in the LLC.", + "EventCode": "0x34", + "EventName": "MEM_BOUND_STALLS_LOAD.LLC_HIT", + "SampleAfterValue": "1000003", + "UMask": "0x6", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of unhalted cycles when the= core is stalled due to a demand load miss which missed all the local cache= s.", + "EventCode": "0x34", + "EventName": "MEM_BOUND_STALLS_LOAD.LLC_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x68", + "Unit": "cpu_atom" + }, { "BriefDescription": "Retired load instructions.", "Data_LA": "1", @@ -75,6 +384,352 @@ "UMask": "0x82", "Unit": "cpu_core" }, + { + "BriefDescription": "All retired memory instructions.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts all retired memory instructions - loa= ds and stores.", + "SampleAfterValue": "1000003", + "UMask": "0x83", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions with locked access.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.LOCK_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with locked= access.", + "SampleAfterValue": "100007", + "UMask": "0x21", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions that split across a= cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions that split = across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x41", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired store instructions that split across = a cacheline boundary.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.SPLIT_STORES", + "PEBS": "1", + "PublicDescription": "Counts retired store instructions that split= across a cacheline boundary.", + "SampleAfterValue": "100003", + "UMask": "0x42", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions that hit the STLB.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_LOADS", + "PEBS": "1", + "PublicDescription": "Number of retired load instructions with a c= lean hit in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x9", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired store instructions that hit the STLB.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_HIT_STORES", + "PEBS": "1", + "PublicDescription": "Number of retired store instructions that hi= t in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0xa", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions that miss the STLB.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", + "PEBS": "1", + "PublicDescription": "Number of retired load instructions that (st= art a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x11", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired store instructions that miss the STLB= .", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", + "PEBS": "1", + "PublicDescription": "Number of retired store instructions that (s= tart a) miss in the 2nd-level TLB (STLB).", + "SampleAfterValue": "100003", + "UMask": "0x12", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Completed demand load uops that miss the L1 d= -cache.", + "EventCode": "0x43", + "EventName": "MEM_LOAD_COMPLETED.L1_MISS_ANY", + "PublicDescription": "Number of completed demand load requests tha= t missed the L1 data cache including shadow misses (FB hits, merge to an on= going L1D miss)", + "SampleAfterValue": "1000003", + "UMask": "0xfd", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were HitM responses from shared L3", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3.", + "SampleAfterValue": "20011", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 and cross-core snoop hits in on-pkg core cache", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were HitM responses from shared L3", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were HitM responses from shared L3.", + "SampleAfterValue": "20011", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were hits in L3 without snoops required", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were hits in L3 without snoops required.", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions whose data sources = were L3 and cross-core snoop hits in on-pkg core cache", + "Data_LA": "1", + "EventCode": "0xd2", + "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions whose data = sources were L3 and cross-core snoop hits in on-pkg core cache.", + "SampleAfterValue": "20011", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions which data sources = missed L3 but serviced from local dram", + "Data_LA": "1", + "EventCode": "0xd3", + "EventName": "MEM_LOAD_L3_MISS_RETIRED.LOCAL_DRAM", + "PEBS": "1", + "PublicDescription": "Retired load instructions which data sources= missed L3 but serviced from local DRAM.", + "SampleAfterValue": "100007", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions with at least 1 uncachea= ble load or lock.", + "Data_LA": "1", + "EventCode": "0xd4", + "EventName": "MEM_LOAD_MISC_RETIRED.UC", + "PEBS": "1", + "PublicDescription": "Retired instructions with at least one load = to uncacheable memory-type, or at least one cache-line split locked access = (Bus Lock).", + "SampleAfterValue": "100007", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of completed demand load requests that= missed the L1, but hit the FB(fill buffer), because a preceding miss to th= e same cacheline initiated the line to be brought into L1, but data is not = yet ready in L1.", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.FB_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding= miss to the same cache line with data not ready.", + "SampleAfterValue": "100007", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions with L1 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L1 data cache. This event includes all SW prefet= ches and lock instructions regardless of the data source.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions missed L1 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L1_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L1 cache.", + "SampleAfterValue": "200003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions with L2 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with L2 cac= he hits as data sources.", + "SampleAfterValue": "200003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions missed L2 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L2_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions missed L2 c= ache as data sources.", + "SampleAfterValue": "100021", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions with L3 cache hits = as data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_HIT", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that hit in the L3 cache.", + "SampleAfterValue": "100021", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired load instructions missed L3 cache as = data sources", + "Data_LA": "1", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_RETIRED.L3_MISS", + "PEBS": "1", + "PublicDescription": "Counts retired load instructions with at lea= st one uop that missed in the L3 cache.", + "SampleAfterValue": "50021", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of load ops retired that mi= ss the L3 cache and hit in DRAM", + "EventCode": "0xd4", + "EventName": "MEM_LOAD_UOPS_MISC_RETIRED.LOCAL_DRAM", + "PEBS": "1", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of load ops retired that hi= t the L1 data cache.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.L1_HIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of load ops retired that mi= ss in the L1 data cache.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.L1_MISS", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x40", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of load ops retired that hi= t in the L2 cache.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.L2_HIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of load ops retired that mi= ss in the L2 cache.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.L2_MISS", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x80", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of load ops retired that hi= t in the L3 cache.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.L3_HIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x1c", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of loads that hit in a writ= e combining buffer (WCB), excluding the first load that caused the WCB to a= llocate.", + "EventCode": "0xd1", + "EventName": "MEM_LOAD_UOPS_RETIRED.WCB_HIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that uops are blo= cked for any of the following reasons: load buffer, store buffer or RSV fu= ll.", + "EventCode": "0x04", + "EventName": "MEM_SCHEDULER_BLOCK.ALL", + "SampleAfterValue": "20003", + "UMask": "0x7", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that uops are blo= cked due to a load buffer full condition.", + "EventCode": "0x04", + "EventName": "MEM_SCHEDULER_BLOCK.LD_BUF", + "SampleAfterValue": "20003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that uops are blo= cked due to an RSV full condition.", + "EventCode": "0x04", + "EventName": "MEM_SCHEDULER_BLOCK.RSV", + "SampleAfterValue": "20003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that uops are blo= cked due to a store buffer full condition.", + "EventCode": "0x04", + "EventName": "MEM_SCHEDULER_BLOCK.ST_BUF", + "SampleAfterValue": "20003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of load ops retired.", "Data_LA": "1", @@ -95,6 +750,18 @@ "UMask": "0x82", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_1024", + "MSRIndex": "0x3F6", + "MSRValue": "0x400", + "PEBS": "2", + "SampleAfterValue": "1000003", + "UMask": "0x5", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", "Data_LA": "1", @@ -119,6 +786,18 @@ "UMask": "0x5", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.LOAD_LATENCY_GT_2048", + "MSRIndex": "0x3F6", + "MSRValue": "0x800", + "PEBS": "2", + "SampleAfterValue": "1000003", + "UMask": "0x5", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of tagged load uops retired= that exceed the latency threshold defined in MEC_CR_PEBS_LD_LAT_THRESHOLD = - Only counts with PEBS enabled.", "Data_LA": "1", @@ -191,6 +870,46 @@ "UMask": "0x5", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of load uops retired that p= erformed one or more locks", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.LOCK_LOADS", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x21", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of memory uops retired that= were splits.", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.SPLIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x43", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of retired split load uops.= ", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.SPLIT_LOADS", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x41", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of retired split store uops= .", + "Data_LA": "1", + "EventCode": "0xd0", + "EventName": "MEM_UOPS_RETIRED.SPLIT_STORES", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x42", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of stores uops retired sam= e as MEM_UOPS_RETIRED.ALL_STORES", "Data_LA": "1", @@ -200,5 +919,97 @@ "SampleAfterValue": "1000003", "UMask": "0x6", "Unit": "cpu_atom" + }, + { + "BriefDescription": "Retired memory uops for any access", + "EventCode": "0xe5", + "EventName": "MEM_UOP_RETIRED.ANY", + "PublicDescription": "Number of retired micro-operations (uops) fo= r load or store memory accesses", + "SampleAfterValue": "1000003", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts demand data reads that resulted in a s= noop hit in another cores caches, data forwarding is required as the data i= s modified.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts demand data reads that resulted in a s= noop hit in another cores caches which forwarded the unmodified data to the= requesting core.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x8003C0001", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts demand read for ownership (RFO) reques= ts and software prefetches for exclusive ownership (PREFETCHW) that resulte= d in a snoop hit in another cores caches, data forwarding is required as th= e data is modified.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM", + "MSRIndex": "0x1a6,0x1a7", + "MSRValue": "0x10003C0002", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Any memory transaction that reached the SQ.", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", + "PublicDescription": "Counts memory transactions reached the super= queue including requests initiated by the core, all L3 prefetches, page wa= lks, etc..", + "SampleAfterValue": "100003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand and prefetch data reads", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.DATA_RD", + "PublicDescription": "Counts the demand and prefetch data reads. A= ll Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 p= refetchers). Counting also covers reads due to page walks resulted from any= request type.", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand Data Read requests sent to uncore", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", + "PublicDescription": "Counts the Demand Data Read requests sent to= uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determi= ne average latency in the uncore.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Demand RFO requests including regular RFOs, l= ocks, ItoM", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", + "PublicDescription": "Counts the demand RFO (read for ownership) r= equests including regular RFOs, locks, ItoM.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts bus locks, accounts for cache line spl= it locks and UC locks.", + "EventCode": "0x2c", + "EventName": "SQ_MISC.BUS_LOCK", + "PublicDescription": "Counts the more expensive bus lock needed to= enforce cache coherency for certain memory accesses that need to be done a= tomically. Can be created by issuing an atomic instruction (via the LOCK p= refix) which causes a cache line split or accesses uncacheable memory.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to an icache miss", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.ICACHE", + "SampleAfterValue": "1000003", + "UMask": "0x20", + "Unit": "cpu_atom" } ] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json = b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json new file mode 100644 index 000000000000..616489f0974a --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/meteorlake/floating-point.json @@ -0,0 +1,143 @@ +[ + { + "BriefDescription": "This event counts the cycles the floating poi= nt divider is busy.", + "CounterMask": "1", + "EventCode": "0xb0", + "EventName": "ARITH.FPDIV_ACTIVE", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts all microcode FP assists.", + "EventCode": "0xc1", + "EventName": "ASSISTS.FP", + "PublicDescription": "Counts all microcode Floating Point assists.= ", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "ASSISTS.SSE_AVX_MIX", + "EventCode": "0xc1", + "EventName": "ASSISTS.SSE_AVX_MIX", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "FP_ARITH_DISPATCHED.PORT_0", + "EventCode": "0xb3", + "EventName": "FP_ARITH_DISPATCHED.PORT_0", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "FP_ARITH_DISPATCHED.PORT_1", + "EventCode": "0xb3", + "EventName": "FP_ARITH_DISPATCHED.PORT_1", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 128-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 2 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice = as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 2 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as the= y perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR re= gister need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single precision floating-point instructions retired; some instructions w= ill count twice as noted below. Each count represents 4 computation operat= ions, one for each element. Applies to SSE* and AVX* packed single precisi= on floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT = DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they pe= rform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed double precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 4 computation= operations, one for each element. Applies to SSE* and AVX* packed double = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perf= orm 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed double precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 4 computation opera= tions, one for each element. Applies to SSE* and AVX* packed double precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 = calculations per element. The DAZ and FTZ flags in the MXCSR register need = to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational 256-bi= t packed single precision floating-point instructions retired; some instruc= tions will count twice as noted below. Each count represents 8 computation= operations, one for each element. Applies to SSE* and AVX* packed single = precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN= MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", + "PublicDescription": "Number of SSE/AVX computational 256-bit pack= ed single precision floating-point instructions retired; some instructions = will count twice as noted below. Each count represents 8 computation opera= tions, one for each element. Applies to SSE* and AVX* packed single precis= ion floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX S= QRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count tw= ice as they perform 2 calculations per element. The DAZ and FTZ flags in th= e MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of SSE/AVX computational 128-bit packe= d single and 256-bit packed double precision FP instructions retired; some = instructions will count twice as noted below. Each count represents 2 or/a= nd 4 computation operations, 1 for each element. Applies to SSE* and AVX* = packed single precision and packed double precision FP instructions: ADD SU= B HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DP= P and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", + "PublicDescription": "Number of SSE/AVX computational 128-bit pack= ed single precision and 256-bit packed double precision floating-point ins= tructions retired; some instructions will count twice as noted below. Each= count represents 2 or/and 4 computation operations, one for each element. = Applies to SSE* and AVX* packed single precision floating-point and packed= double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL= DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB ins= tructions count twice as they perform 2 calculations per element. The DAZ a= nd FTZ flags in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x18", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of SSE/AVX computational scalar floati= ng-point instructions retired; some instructions will count twice as noted = below. Applies to SSE* and AVX* scalar, double and single precision floati= ng-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB= . DPP and FM(N)ADD/SUB instructions count twice as they perform multiple c= alculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision and double precision floating-point instructions retired; some = instructions will count twice as noted below. Each count represents 1 comp= utational operation. Applies to SSE* and AVX* scalar single precision float= ing-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB= . FM(N)ADD/SUB instructions count twice as they perform 2 calculations per= element. The DAZ and FTZ flags in the MXCSR register need to be set when u= sing these events.", + "SampleAfterValue": "1000003", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= double precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar double precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructi= ons count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", + "PublicDescription": "Number of SSE/AVX computational scalar doubl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar double precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions co= unt twice as they perform 2 calculations per element. The DAZ and FTZ flags= in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of SSE/AVX computational scalar= single precision floating-point instructions retired; some instructions wi= ll count twice as noted below. Each count represents 1 computational opera= tion. Applies to SSE* and AVX* scalar single precision floating-point instr= uctions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB= instructions count twice as they perform 2 calculations per element.", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", + "PublicDescription": "Number of SSE/AVX computational scalar singl= e precision floating-point instructions retired; some instructions will cou= nt twice as noted below. Each count represents 1 computational operation. = Applies to SSE* and AVX* scalar single precision floating-point instruction= s: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instr= uctions count twice as they perform 2 calculations per element. The DAZ and= FTZ flags in the MXCSR register need to be set when using these events.", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of any Vector retired FP arithmetic in= structions", + "EventCode": "0xc7", + "EventName": "FP_ARITH_INST_RETIRED.VECTOR", + "PublicDescription": "Number of any Vector retired FP arithmetic i= nstructions. The DAZ and FTZ flags in the MXCSR register need to be set wh= en using these events.", + "SampleAfterValue": "1000003", + "UMask": "0xfc", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of floating point operation= s retired that required microcode assist.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.FP_ASSIST", + "PublicDescription": "Counts the number of floating point operatio= ns retired that required microcode assist, which is not a reflection of the= number of FP operations, instructions or uops.", + "SampleAfterValue": "20003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of floating point divide uo= ps retired (x87 and sse, including x87 sqrt).", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.FPDIV", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_atom" + } +] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/frontend.json index 66e5609699ea..0f064518d1c0 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json @@ -1,4 +1,259 @@ [ + { + "BriefDescription": "Counts the total number of BACLEARS due to al= l branch types including conditional and unconditional jumps, returns, and = indirect branches.", + "EventCode": "0xe6", + "EventName": "BACLEARS.ANY", + "PublicDescription": "Counts the total number of BACLEARS, which o= ccur when the Branch Target Buffer (BTB) prediction or lack thereof, was co= rrected by a later branch predictor in the frontend. Includes BACLEARS due= to all branch types including conditional and unconditional jumps, returns= , and indirect branches.", + "SampleAfterValue": "200003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Clears due to Unknown Branches.", + "EventCode": "0x60", + "EventName": "BACLEARS.ANY", + "PublicDescription": "Number of times the front-end is resteered w= hen it finds a branch instruction in a fetch line. This is called Unknown B= ranch which occurs for the first time a branch instruction is fetched or wh= en the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "SampleAfterValue": "500009", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles the Microcode Sequencer is busy.", + "EventCode": "0x87", + "EventName": "DECODE.MS_BUSY", + "SampleAfterValue": "500009", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "DSB-to-MITE switch true penalty cycles.", + "EventCode": "0x61", + "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", + "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache th= at holds translations of previously fetched instructions that were decoded = by the legacy x86 decode pipeline (MITE). This event counts fetch penalty c= ycles when a transition occurs from DSB to MITE.", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired ANT branches", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ANY_ANT", + "MSRIndex": "0x3F7", + "MSRValue": "0x9", + "PEBS": "1", + "PublicDescription": "Always Not Taken (ANT) conditional retired b= ranches (no BTB entry and not mispredicted)", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of instructions retired tha= t were tagged because empty issue slots were seen before the uop due to ITL= B miss", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ITLB_MISS", + "PEBS": "1", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Retired Instructions who experienced iTLB tru= e miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.ITLB_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x14", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions that experienced= iTLB (Instruction TLB) true miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired Instructions who experienced Instruct= ion L1 Cache true miss.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.L1I_MISS", + "MSRIndex": "0x3F7", + "MSRValue": "0x12", + "PEBS": "1", + "PublicDescription": "Counts retired Instructions who experienced = Instruction L1 Cache true miss.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 1 cycle", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x600106", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 1 cycle which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 128 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", + "MSRIndex": "0x3F7", + "MSRValue": "0x608006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 12= 8 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 16 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", + "MSRIndex": "0x3F7", + "MSRValue": "0x601006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 16 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions after front-end starvati= on of at least 2 cycles", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", + "MSRIndex": "0x3F7", + "MSRValue": "0x600206", + "PEBS": "1", + "PublicDescription": "Retired instructions that are fetched after = an interval where the front-end delivered no uops for a period of at least = 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 256 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", + "MSRIndex": "0x3F7", + "MSRValue": "0x610006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 25= 6 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end had at least 1 bubble-slot for a period of 2= cycles which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", + "MSRIndex": "0x3F7", + "MSRValue": "0x100206", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after the front-end had at least 1 bubble-slot for a per= iod of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there = was no RAT stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 32 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", + "MSRIndex": "0x3F7", + "MSRValue": "0x602006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 32 cycles. During th= is period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 4 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", + "MSRIndex": "0x3F7", + "MSRValue": "0x600406", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 4 = cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 512 cycles= which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", + "MSRIndex": "0x3F7", + "MSRValue": "0x620006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 51= 2 cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 64 cycles = which was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", + "MSRIndex": "0x3F7", + "MSRValue": "0x604006", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are fetched= after an interval where the front-end delivered no uops for a period of 64= cycles which was not interrupted by a back-end stall.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired instructions that are fetched after a= n interval where the front-end delivered no uops for a period of 8 cycles w= hich was not interrupted by a back-end stall.", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", + "MSRIndex": "0x3F7", + "MSRValue": "0x600806", + "PEBS": "1", + "PublicDescription": "Counts retired instructions that are deliver= ed to the back-end after a front-end stall of at least 8 cycles. During thi= s period the front-end delivered no uops.", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted Retired ANT branches", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.MISP_ANT", + "MSRIndex": "0x3F7", + "MSRValue": "0x9", + "PEBS": "1", + "PublicDescription": "ANT retired branches that got just mispredic= ted", + "SampleAfterValue": "100007", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "FRONTEND_RETIRED.MS_FLOWS", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.MS_FLOWS", + "MSRIndex": "0x3F7", + "MSRValue": "0x8", + "PEBS": "1", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "FRONTEND_RETIRED.UNKNOWN_BRANCH", + "EventCode": "0xc6", + "EventName": "FRONTEND_RETIRED.UNKNOWN_BRANCH", + "MSRIndex": "0x3F7", + "MSRValue": "0x17", + "PEBS": "1", + "SampleAfterValue": "100007", + "UMask": "0x3", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts every time the code stream enters into= a new cache line by walking sequential from the previous line or being red= irected by a jump.", "EventCode": "0x80", @@ -15,6 +270,131 @@ "UMask": "0x2", "Unit": "cpu_atom" }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss.", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The decode pipeline works at a 32= Byte granularity.", + "SampleAfterValue": "500009", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "ICACHE_DATA.STALL_PERIODS", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALL_PERIODS", + "SampleAfterValue": "500009", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Instruction fetch tag lookups that hit in the= instruction cache (L1I). Counts at 64-byte cache-line granularity.", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.HIT", + "PublicDescription": "Counts instruction fetch tag lookups that hi= t in the instruction cache (L1I). Counts at 64-byte cache-line granularity.= Accounts for both cacheable and uncacheable accesses.", + "SampleAfterValue": "200003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", + "SampleAfterValue": "200003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath.", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles DSB is delivering optimal number of Uo= ps", + "CounterMask": "6", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", + "EventCode": "0x79", + "EventName": "IDQ.DSB_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles MITE is delivering any Uop", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipe= line) path. During these cycles uops are not being delivered from the Decod= e Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles MITE is delivering optimal number of U= ops", + "CounterMask": "6", + "EventCode": "0x79", + "EventName": "IDQ.MITE_CYCLES_OK", + "PublicDescription": "Counts the number of cycles where optimal nu= mber of uops was delivered to the Instruction Decode Queue (IDQ) from the M= ITE (legacy decode pipeline) path. During these cycles uops are not being d= elivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from MITE path", + "EventCode": "0x79", + "EventName": "IDQ.MITE_UOPS", + "PublicDescription": "Counts the number of uops delivered to Instr= uction Decode Queue (IDQ) from the MITE path. This also means that uops are= not being delivered from the Decode Stream Buffer (DSB).", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when uops are being delivered to IDQ w= hile MS is busy", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_CYCLES_ANY", + "PublicDescription": "Counts cycles during which uops are being de= livered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS= ) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.", + "SampleAfterValue": "2000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of switches from DSB or MITE to the MS= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0x79", + "EventName": "IDQ.MS_SWITCHES", + "PublicDescription": "Number of switches from DSB (Decode Stream B= uffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.", + "SampleAfterValue": "100003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops initiated by MITE or Decode Stream Buffe= r (DSB) and delivered to Instruction Decode Queue (IDQ) while Microcode Seq= uencer (MS) is busy", + "EventCode": "0x79", + "EventName": "IDQ.MS_UOPS", + "PublicDescription": "Counts the number of uops initiated by MITE = or Decode Stream Buffer (DSB) and delivered to Instruction Decode Queue (ID= Q) while the Microcode Sequencer (MS) is busy. Counting includes uops that = may 'bypass' the IDQ.", + "SampleAfterValue": "1000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, { "BriefDescription": "This event counts a subset of the Topdown Slo= ts event that were no operation was delivered to the back-end pipeline due = to instruction fetch limitations when the back-end could have accepted more= operations. Common examples include instruction cache misses or x86 instru= ction decode limitations.", "EventCode": "0x9c", @@ -23,5 +403,35 @@ "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops not delivered by IDQ when backend of the= machine is not stalled", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", + "PublicDescription": "Counts the number of uops not delivered to b= y the Instruction Decode Queue (IDQ) to the back-end of the pipeline when t= here was no back-end stalls. This event counts for one SMT thread in a give= n cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when no uops are not delivered by the = IDQ when backend of the machine is not stalled", + "CounterMask": "6", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", + "PublicDescription": "Counts the number of cycles when no uops wer= e delivered by the Instruction Decode Queue (IDQ) to the back-end of the pi= peline when there was no back-end stalls. This event counts for one SMT thr= ead in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when optimal number of uops was delive= red to the back-end when the back-end is not stalled", + "CounterMask": "1", + "EventCode": "0x9c", + "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK", + "Invert": "1", + "PublicDescription": "Counts the number of cycles when the optimal= number of uops were delivered by the Instruction Decode Queue (IDQ) to the= back-end of the pipeline when there was no back-end stalls. This event cou= nts for one SMT thread in a given cycle.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" } ] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json b/tools/= perf/pmu-events/arch/x86/meteorlake/memory.json index 20c2efe70eeb..67e949b4c789 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json @@ -1,4 +1,100 @@ [ + { + "BriefDescription": "Cycles while L3 cache miss demand load is out= standing.", + "CounterMask": "2", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Execution stalls while L3 cache miss demand l= oad is outstanding.", + "CounterMask": "6", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x6", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer is stalled due to any number of reasons, incl= uding an L1 miss, WCB full, pagewalk, store address block or store data blo= ck, on a load that retires.", + "EventCode": "0x05", + "EventName": "LD_HEAD.ANY_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0xff", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer is stalled due to a core bound stall includin= g a store address match, a DTLB miss or a page walk that detains the load f= rom retiring.", + "EventCode": "0x05", + "EventName": "LD_HEAD.L1_BOUND_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0xf4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer and retirement are both stalled due to a DL1 = miss.", + "EventCode": "0x05", + "EventName": "LD_HEAD.L1_MISS_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0x81", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer and retirement are both stalled due to other = block cases.", + "EventCode": "0x05", + "EventName": "LD_HEAD.OTHER_AT_RET", + "PublicDescription": "Counts the number of cycles that the head (o= ldest load) of the load buffer and retirement are both stalled due to other= block cases such as pipeline conflicts, fences, etc.", + "SampleAfterValue": "1000003", + "UMask": "0xc0", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer and retirement are both stalled due to a page= walk.", + "EventCode": "0x05", + "EventName": "LD_HEAD.PGWALK_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0xa0", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer and retirement are both stalled due to a stor= e address match.", + "EventCode": "0x05", + "EventName": "LD_HEAD.ST_ADDR_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0x84", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Execution stalls while L1 cache miss demand l= oad is outstanding.", + "CounterMask": "3", + "EventCode": "0x47", + "EventName": "MEMORY_ACTIVITY.STALLS_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Execution stalls while L2 cache miss demand c= acheable load request is outstanding.", + "CounterMask": "5", + "EventCode": "0x47", + "EventName": "MEMORY_ACTIVITY.STALLS_L2_MISS", + "PublicDescription": "Execution stalls while L2 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", + "SampleAfterValue": "1000003", + "UMask": "0x5", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Execution stalls while L3 cache miss demand c= acheable load request is outstanding.", + "CounterMask": "9", + "EventCode": "0x47", + "EventName": "MEMORY_ACTIVITY.STALLS_L3_MISS", + "PublicDescription": "Execution stalls while L3 cache miss demand = cacheable load request is outstanding (will not count for uncacheable deman= d requests e.g. bus lock).", + "SampleAfterValue": "1000003", + "UMask": "0x9", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts randomly selected loads when the laten= cy from first dispatch to completion is greater than 128 cycles.", "Data_LA": "1", @@ -115,43 +211,29 @@ "Unit": "cpu_core" }, { - "BriefDescription": "Counts demand data reads that were not suppli= ed by the L3 cache.", - "EventCode": "0xB7", - "EventName": "OCR.DEMAND_DATA_RD.L3_MISS", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x3FBFC00001", - "SampleAfterValue": "100003", - "UMask": "0x1", + "BriefDescription": "Counts misaligned loads that are 4K page spli= ts.", + "EventCode": "0x13", + "EventName": "MISALIGN_MEM_REF.LOAD_PAGE_SPLIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x2", "Unit": "cpu_atom" }, { - "BriefDescription": "Counts demand data reads that were not suppli= ed by the L3 cache.", - "EventCode": "0x2A,0x2B", - "EventName": "OCR.DEMAND_DATA_RD.L3_MISS", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x3FBFC00001", - "SampleAfterValue": "100003", - "UMask": "0x1", - "Unit": "cpu_core" - }, - { - "BriefDescription": "Counts demand reads for ownership (RFO) and s= oftware prefetches for exclusive ownership (PREFETCHW) that were not suppli= ed by the L3 cache.", - "EventCode": "0xB7", - "EventName": "OCR.DEMAND_RFO.L3_MISS", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x3FBFC00002", - "SampleAfterValue": "100003", - "UMask": "0x1", + "BriefDescription": "Counts misaligned stores that are 4K page spl= its.", + "EventCode": "0x13", + "EventName": "MISALIGN_MEM_REF.STORE_PAGE_SPLIT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x4", "Unit": "cpu_atom" }, { - "BriefDescription": "Counts demand read for ownership (RFO) reques= ts and software prefetches for exclusive ownership (PREFETCHW) that were no= t supplied by the L3 cache.", - "EventCode": "0x2A,0x2B", - "EventName": "OCR.DEMAND_RFO.L3_MISS", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x3FBFC00002", + "BriefDescription": "Counts demand data read requests that miss th= e L3 cache.", + "EventCode": "0x21", + "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", "SampleAfterValue": "100003", - "UMask": "0x1", + "UMask": "0x10", "Unit": "cpu_core" } ] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/other.json b/tools/p= erf/pmu-events/arch/x86/meteorlake/other.json index 14e648bf11c5..2ec57f487525 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/other.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/other.json @@ -1,41 +1,50 @@ [ { - "BriefDescription": "Counts demand data reads that have any type o= f response.", - "EventCode": "0xB7", - "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE", + "BriefDescription": "Counts streaming stores that have any type of= response.", + "EventCode": "0x2A,0x2B", + "EventName": "OCR.STREAMING_WR.ANY_RESPONSE", "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x10001", + "MSRValue": "0x10800", "SampleAfterValue": "100003", "UMask": "0x1", - "Unit": "cpu_atom" + "Unit": "cpu_core" }, { - "BriefDescription": "Counts demand data reads that have any type o= f response.", - "EventCode": "0x2A,0x2B", - "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x10001", - "SampleAfterValue": "100003", - "UMask": "0x1", + "BriefDescription": "Cycles when Reservation Station (RS) is empty= for the thread.", + "EventCode": "0xa5", + "EventName": "RS.EMPTY", + "PublicDescription": "Counts cycles during which the reservation s= tation (RS) is empty for this logical processor. This is usually caused whe= n the front-end pipeline runs into starvation periods (e.g. branch mispredi= ctions or i-cache misses)", + "SampleAfterValue": "1000003", + "UMask": "0x7", "Unit": "cpu_core" }, { - "BriefDescription": "Counts demand reads for ownership (RFO) and s= oftware prefetches for exclusive ownership (PREFETCHW) that have any type o= f response.", - "EventCode": "0xB7", - "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x10002", + "BriefDescription": "Counts end of periods where the Reservation S= tation (RS) was empty.", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xa5", + "EventName": "RS.EMPTY_COUNT", + "Invert": "1", + "PublicDescription": "Counts end of periods where the Reservation = Station (RS) was empty. Could be useful to closely sample on front-end late= ncy issues (see the FRONTEND_RETIRED event of designated precise events)", "SampleAfterValue": "100003", - "UMask": "0x1", + "UMask": "0x7", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of issue slots in a UMWAIT = or TPAUSE instruction where no uop issues due to the instruction putting th= e CPU into the C0.1 activity state. For Tremont, UMWAIT and TPAUSE will onl= y put the CPU into C0.1 activity state (not C0.2 activity state)", + "EventCode": "0x75", + "EventName": "SERIALIZATION.C01_MS_SCB", + "SampleAfterValue": "200003", + "UMask": "0x4", "Unit": "cpu_atom" }, { - "BriefDescription": "Counts demand read for ownership (RFO) reques= ts and software prefetches for exclusive ownership (PREFETCHW) that have an= y type of response.", - "EventCode": "0x2A,0x2B", - "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE", - "MSRIndex": "0x1a6,0x1a7", - "MSRValue": "0x10002", - "SampleAfterValue": "100003", + "BriefDescription": "Cycles the uncore cannot take further request= s", + "CounterMask": "1", + "EventCode": "0x2d", + "EventName": "XQ.FULL_CYCLES", + "PublicDescription": "number of cycles when the thread is active a= nd the uncore cannot take any further requests (for example prefetches, loa= ds or stores initiated by the Core that miss the L2 cache).", + "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" } diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tool= s/perf/pmu-events/arch/x86/meteorlake/pipeline.json index 639789478073..eeaa7a97f71c 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json @@ -1,4 +1,32 @@ [ + { + "BriefDescription": "Cycles when divide unit is busy executing div= ide or square root operations.", + "CounterMask": "1", + "EventCode": "0xb0", + "EventName": "ARITH.DIV_ACTIVE", + "PublicDescription": "Counts cycles when divide unit is busy execu= ting divide or square root operations. Accounts for integer and floating-po= int operations.", + "SampleAfterValue": "1000003", + "UMask": "0x9", + "Unit": "cpu_core" + }, + { + "BriefDescription": "This event counts the cycles the integer divi= der is busy.", + "CounterMask": "1", + "EventCode": "0xb0", + "EventName": "ARITH.IDIV_ACTIVE", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of occurrences where a microcode assis= t is invoked by hardware.", + "EventCode": "0xc1", + "EventName": "ASSISTS.ANY", + "PublicDescription": "Counts the number of occurrences where a mic= rocode assist is invoked by hardware. Examples include AD (page Access Dirt= y), FP and AVX related assists.", + "SampleAfterValue": "100003", + "UMask": "0x1b", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the total number of branch instruction= s retired for all branch types.", "EventCode": "0xc4", @@ -17,6 +45,104 @@ "SampleAfterValue": "400009", "Unit": "cpu_core" }, + { + "BriefDescription": "Conditional branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts conditional branch instructions retir= ed.", + "SampleAfterValue": "400009", + "UMask": "0x11", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Not taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts not taken branch instructions retired= .", + "SampleAfterValue": "400009", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Taken conditional branch instructions retired= .", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional branch instructions= retired.", + "SampleAfterValue": "400009", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of far branch instructions = retired, includes far jump, far call and return, and interrupt call and ret= urn.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.FAR_BRANCH", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0xbf", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Far branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.FAR_BRANCH", + "PEBS": "1", + "PublicDescription": "Counts far branch instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Indirect near branch instructions retired (ex= cluding returns)", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts near indirect branch instructions ret= ired excluding returns. TSX abort is an indirect branch.", + "SampleAfterValue": "100003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of near CALL branch instruc= tions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_CALL", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0xf9", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Direct and indirect near call instructions re= tired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_CALL", + "PEBS": "1", + "PublicDescription": "Counts both direct and indirect near call in= structions retired.", + "SampleAfterValue": "100007", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Return instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_RETURN", + "PEBS": "1", + "PublicDescription": "Counts return instructions retired.", + "SampleAfterValue": "100007", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Taken branch instructions retired.", + "EventCode": "0xc4", + "EventName": "BR_INST_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken branch instructions retired.", + "SampleAfterValue": "400009", + "UMask": "0x20", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the total number of mispredicted branc= h instructions retired for all branch types.", "EventCode": "0xc5", @@ -35,6 +161,174 @@ "SampleAfterValue": "400009", "Unit": "cpu_core" }, + { + "BriefDescription": "All mispredicted branch instructions retired.= This precise event may be used to get the misprediction cost via the Retir= e_Latency field of PEBS. It fires on the instruction that immediately follo= ws the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.ALL_BRANCHES_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x44", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of mispredicted JCC (Jump o= n Conditional Code) branch instructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0x7e", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Mispredicted conditional branch instructions = retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND", + "PEBS": "1", + "PublicDescription": "Counts mispredicted conditional branch instr= uctions retired.", + "SampleAfterValue": "400009", + "UMask": "0x11", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted conditional branch instructions = retired. This precise event may be used to get the misprediction cost via t= he Retire_Latency field of PEBS. It fires on the instruction that immediate= ly follows the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x51", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted non-taken conditional branch ins= tructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_NTAKEN", + "PEBS": "1", + "PublicDescription": "Counts the number of conditional branch inst= ructions retired that were mispredicted and the branch direction was not ta= ken.", + "SampleAfterValue": "400009", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted non-taken conditional branch ins= tructions retired. This precise event may be used to get the misprediction = cost via the Retire_Latency field of PEBS. It fires on the instruction that= immediately follows the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_NTAKEN_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x50", + "Unit": "cpu_core" + }, + { + "BriefDescription": "number of branch instructions retired that we= re mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts taken conditional mispredicted branch= instructions retired.", + "SampleAfterValue": "400009", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted taken conditional branch instruc= tions retired. This precise event may be used to get the misprediction cost= via the Retire_Latency field of PEBS. It fires on the instruction that imm= ediately follows the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.COND_TAKEN_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x41", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of mispredicted near indire= ct JMP and near indirect CALL branch instructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0xeb", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Miss-predicted near indirect branch instructi= ons retired (excluding returns)", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT", + "PEBS": "1", + "PublicDescription": "Counts miss-predicted near indirect branch i= nstructions retired excluding returns. TSX abort is an indirect branch.", + "SampleAfterValue": "100003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of mispredicted near indire= ct CALL branch instructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0xfb", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Mispredicted indirect CALL retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", + "PEBS": "1", + "PublicDescription": "Counts retired mispredicted indirect (near t= aken) CALL instructions, including both register and memory indirect.", + "SampleAfterValue": "400009", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted indirect CALL retired. This prec= ise event may be used to get the misprediction cost via the Retire_Latency = field of PEBS. It fires on the instruction that immediately follows the mis= predicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_CALL_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x42", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted near indirect branch instruction= s retired (excluding returns). This precise event may be used to get the mi= sprediction cost via the Retire_Latency field of PEBS. It fires on the inst= ruction that immediately follows the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.INDIRECT_COST", + "PEBS": "1", + "SampleAfterValue": "100003", + "UMask": "0xc0", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of near branch instructions retired th= at were mispredicted and taken.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", + "PEBS": "1", + "PublicDescription": "Counts number of near branch instructions re= tired that were mispredicted and taken.", + "SampleAfterValue": "400009", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Mispredicted taken near branch instructions r= etired. This precise event may be used to get the misprediction cost via th= e Retire_Latency field of PEBS. It fires on the instruction that immediatel= y follows the mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.NEAR_TAKEN_COST", + "PEBS": "1", + "SampleAfterValue": "400009", + "UMask": "0x60", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of mispredicted near RET br= anch instructions retired.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.RETURN", + "PEBS": "1", + "SampleAfterValue": "200003", + "UMask": "0xf7", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Mispredicted ret instructions retired. This p= recise event may be used to get the misprediction cost via the Retire_Laten= cy field of PEBS. It fires on the instruction that immediately follows the = mispredicted branch.", + "EventCode": "0xc5", + "EventName": "BR_MISP_RETIRED.RET_COST", + "PEBS": "1", + "SampleAfterValue": "100007", + "UMask": "0x48", + "Unit": "cpu_core" + }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = core clock cycles", "EventName": "CPU_CLK_UNHALTED.CORE", @@ -49,6 +343,33 @@ "SampleAfterValue": "2000003", "Unit": "cpu_atom" }, + { + "BriefDescription": "Cycle counts are evenly distributed between a= ctive threads in the Core.", + "EventCode": "0xec", + "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", + "PublicDescription": "This event distributes cycle counts between = active hyperthreads, i.e., those in C0. A hyperthread becomes inactive whe= n it executes the HLT or MWAIT instructions. If all other hyperthreads are= inactive (or disabled or do not exist), all counts are attributed to this = hyperthread. To obtain the full count when the Core is active, sum the coun= ts from each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Core crystal clock cycles when this thread is= unhalted and the other thread is halted.", + "EventCode": "0x3c", + "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", + "PublicDescription": "Counts Core crystal clock cycles when curren= t thread is unhalted and the other thread is halted.", + "SampleAfterValue": "25003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Core crystal clock cycles. Cycle counts are e= venly distributed between active threads in the Core.", + "EventCode": "0x3c", + "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", + "PublicDescription": "This event distributes Core crystal clock cy= cle counts between active hyperthreads, i.e., those in C0 sleep-state. A hy= perthread becomes inactive when it executes the HLT or MWAIT instructions. = If one thread is active in a core, all counts are attributed to this hypert= hread. To obtain the full count when the Core is active, sum the counts fro= m each hyperthread.", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, { "BriefDescription": "Fixed Counter: Counts the number of unhalted = reference clock cycles", "EventName": "CPU_CLK_UNHALTED.REF_TSC", @@ -64,6 +385,15 @@ "UMask": "0x3", "Unit": "cpu_core" }, + { + "BriefDescription": "Counts the number of unhalted reference clock= cycles at TSC frequency.", + "EventCode": "0x3c", + "EventName": "CPU_CLK_UNHALTED.REF_TSC_P", + "PublicDescription": "Counts the number of reference cycles that t= he core is not in a halt state. The core enters the halt state when it is r= unning the HLT instruction. This event is not affected by core frequency ch= anges and increments at a fixed frequency that is also used for the Time St= amp Counter (TSC). This event uses a programmable general purpose performan= ce counter.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, { "BriefDescription": "Reference cycles when the core is not in halt= state.", "EventCode": "0x3c", @@ -104,82 +434,487 @@ "Unit": "cpu_core" }, { - "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", - "EventName": "INST_RETIRED.ANY", - "PEBS": "1", - "SampleAfterValue": "2000003", - "UMask": "0x1", - "Unit": "cpu_atom" + "BriefDescription": "Cycles while L1 cache miss demand load is out= standing.", + "CounterMask": "8", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_core" }, { - "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", - "EventName": "INST_RETIRED.ANY", - "PEBS": "1", - "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", - "SampleAfterValue": "2000003", + "BriefDescription": "Cycles while L2 cache miss demand load is out= standing.", + "CounterMask": "1", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", + "SampleAfterValue": "1000003", "UMask": "0x1", "Unit": "cpu_core" }, { - "BriefDescription": "Counts the number of instructions retired", - "EventCode": "0xc0", - "EventName": "INST_RETIRED.ANY_P", - "PEBS": "1", - "SampleAfterValue": "2000003", - "Unit": "cpu_atom" - }, - { - "BriefDescription": "Number of instructions retired. General Count= er - architectural event", - "EventCode": "0xc0", - "EventName": "INST_RETIRED.ANY_P", - "PEBS": "1", - "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", - "SampleAfterValue": "2000003", + "BriefDescription": "Cycles while memory subsystem has an outstand= ing load.", + "CounterMask": "16", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", + "SampleAfterValue": "1000003", + "UMask": "0x10", "Unit": "cpu_core" }, { - "BriefDescription": "Loads blocked due to overlapping with a prece= ding store that cannot be forwarded.", - "EventCode": "0x03", - "EventName": "LD_BLOCKS.STORE_FORWARD", - "PublicDescription": "Counts the number of times where store forwa= rding was prevented for a load operation. The most common case is a load bl= ocked due to the address of memory access (partially) overlapping with a pr= eceding uncompleted store. Note: See the table of not supported store forwa= rds in the Optimization Guide.", - "SampleAfterValue": "100003", - "UMask": "0x82", + "BriefDescription": "Execution stalls while L1 cache miss demand l= oad is outstanding.", + "CounterMask": "12", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", + "SampleAfterValue": "1000003", + "UMask": "0xc", "Unit": "cpu_core" }, { - "BriefDescription": "This event counts a subset of the Topdown Slo= ts event that were not consumed by the back-end pipeline due to lack of bac= k-end resources, as a result of memory subsystem delays, execution units li= mitations, or other conditions.", - "EventCode": "0xa4", - "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", - "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions.\nThe count is distributed among unhalted l= ogical processors (hyper-threads) who share the same physical core, in proc= essors that support Intel Hyper-Threading Technology. Software can use this= event as the numerator for the Backend Bound metric (or top-level category= ) of the Top-down Microarchitecture Analysis method.", - "SampleAfterValue": "10000003", - "UMask": "0x2", + "BriefDescription": "Execution stalls while L2 cache miss demand l= oad is outstanding.", + "CounterMask": "5", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x5", "Unit": "cpu_core" }, { - "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", - "EventName": "TOPDOWN.SLOTS", - "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", - "SampleAfterValue": "10000003", + "BriefDescription": "Total execution stalls.", + "CounterMask": "4", + "EventCode": "0xa3", + "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", + "SampleAfterValue": "1000003", "UMask": "0x4", "Unit": "cpu_core" }, { - "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. General counter - architectural event", - "EventCode": "0xa4", - "EventName": "TOPDOWN.SLOTS_P", - "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", - "SampleAfterValue": "10000003", - "UMask": "0x1", + "BriefDescription": "Cycles total of 1 uop is executed on all port= s and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 1 uop = was executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x2", "Unit": "cpu_core" }, { - "BriefDescription": "Counts the number of issue slots that were no= t consumed by the backend because allocation is stalled due to a mispredict= ed jump or a machine clear.", + "BriefDescription": "Cycles total of 2 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", + "PublicDescription": "Counts cycles during which a total of 2 uops= were executed on all ports and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles total of 3 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", + "PublicDescription": "Cycles total of 3 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles total of 4 uops are executed on all po= rts and Reservation Station was not empty.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", + "PublicDescription": "Cycles total of 4 uops are executed on all p= orts and Reservation Station (RS) was not empty.", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Execution stalls while memory subsystem has a= n outstanding load.", + "CounterMask": "5", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", + "SampleAfterValue": "2000003", + "UMask": "0x21", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where the Store Buffer was full and no= loads caused an execution stall.", + "CounterMask": "2", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", + "PublicDescription": "Counts cycles where the Store Buffer was ful= l and no loads caused an execution stall.", + "SampleAfterValue": "1000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles no uop executed while RS was not empty= , the SB was not full and there was no outstanding load.", + "EventCode": "0xa6", + "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", + "PublicDescription": "Number of cycles total of 0 uops executed on= all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) w= as not full and there was no outstanding load.", + "SampleAfterValue": "1000003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Instruction decoders utilized in a cycle", + "EventCode": "0x75", + "EventName": "INST_DECODED.DECODERS", + "PublicDescription": "Number of decoders utilized in a cycle when = the MITE (legacy decode pipeline) fetches instructions.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Fixed Counter: Counts the number of instructi= ons retired", + "EventName": "INST_RETIRED.ANY", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Number of instructions retired. Fixed Counter= - architectural event", + "EventName": "INST_RETIRED.ANY", + "PEBS": "1", + "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of instructions retired", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.ANY_P", + "PEBS": "1", + "SampleAfterValue": "2000003", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Number of instructions retired. General Count= er - architectural event", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.ANY_P", + "PEBS": "1", + "PublicDescription": "Counts the number of X86 instructions retire= d - an Architectural PerfMon event. Counting continues during hardware inte= rrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is co= unted by a designated fixed counter freeing up programmable counters to cou= nt other events. INST_RETIRED.ANY_P is counted by a programmable counter.", + "SampleAfterValue": "2000003", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INST_RETIRED.MACRO_FUSED", + "EventCode": "0xc0", + "EventName": "INST_RETIRED.MACRO_FUSED", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Precise instruction retired with PEBS precise= -distribution", + "EventName": "INST_RETIRED.PREC_DIST", + "PEBS": "1", + "PublicDescription": "A version of INST_RETIRED that allows for a = precise distribution of samples across instructions retired. It utilizes th= e Precise Distribution of Instructions Retired (PDIR++) feature to fix bias= in how retired instructions get sampled. Use on Fixed Counter 0.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles the Backend cluster is recovering afte= r a miss-speculation or a Store Buffer or Load Buffer drain stall.", + "CounterMask": "1", + "EventCode": "0xad", + "EventName": "INT_MISC.ALL_RECOVERY_CYCLES", + "PublicDescription": "Counts cycles the Backend cluster is recover= ing after a miss-speculation or a Store Buffer or Load Buffer drain stall.", + "SampleAfterValue": "2000003", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts cycles after recovery from a branch mi= sprediction or machine clear till the first uop is issued from the resteere= d path.", + "EventCode": "0xad", + "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", + "PublicDescription": "Cycles after recovery from a branch mispredi= ction or machine clear till the first uop is issued from the resteered path= .", + "SampleAfterValue": "500009", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Core cycles the allocator was stalled due to = recovery from earlier clear event for this thread", + "EventCode": "0xad", + "EventName": "INT_MISC.RECOVERY_CYCLES", + "PublicDescription": "Counts core cycles when the Resource allocat= or was stalled due to recovery from an earlier branch misprediction or mach= ine clear event.", + "SampleAfterValue": "500009", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_MISC.UNKNOWN_BRANCH_CYCLES", + "EventCode": "0xad", + "EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES", + "MSRIndex": "0x3F7", + "MSRValue": "0x7", + "SampleAfterValue": "1000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TMA slots where uops got dropped", + "EventCode": "0xad", + "EventName": "INT_MISC.UOP_DROPPING", + "PublicDescription": "Estimated number of Top-down Microarchitectu= re Analysis slots that got dropped due to non front-end reasons", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.128BIT", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.128BIT", + "SampleAfterValue": "1000003", + "UMask": "0x13", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.256BIT", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.256BIT", + "SampleAfterValue": "1000003", + "UMask": "0xac", + "Unit": "cpu_core" + }, + { + "BriefDescription": "integer ADD, SUB, SAD 128-bit vector instruct= ions.", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.ADD_128", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 128-bit vector instructions.", + "SampleAfterValue": "1000003", + "UMask": "0x3", + "Unit": "cpu_core" + }, + { + "BriefDescription": "integer ADD, SUB, SAD 256-bit vector instruct= ions.", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.ADD_256", + "PublicDescription": "Number of retired integer ADD/SUB (regular o= r horizontal), SAD 256-bit vector instructions.", + "SampleAfterValue": "1000003", + "UMask": "0xc", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.MUL_256", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.MUL_256", + "SampleAfterValue": "1000003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.SHUFFLES", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.SHUFFLES", + "SampleAfterValue": "1000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.VNNI_128", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.VNNI_128", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "INT_VEC_RETIRED.VNNI_256", + "EventCode": "0xe7", + "EventName": "INT_VEC_RETIRED.VNNI_256", + "SampleAfterValue": "1000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of retired loads that are b= locked because it initially appears to be store forward blocked, but subseq= uently is shown not to be blocked based on 4K alias check.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.ADDRESS_ALIAS", + "PEBS": "1", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of retired loads that are b= locked because its address exactly matches an older store whose data is not= ready.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.DATA_UNKNOWN", + "PEBS": "1", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of retired loads that are b= locked because its address partially overlapped with an older store.", + "EventCode": "0x03", + "EventName": "LD_BLOCKS.STORE_FORWARD", + "PEBS": "1", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Cycles Uops delivered by the LSD, but didn't = come from the decoder.", + "CounterMask": "1", + "EventCode": "0xa8", + "EventName": "LSD.CYCLES_ACTIVE", + "PublicDescription": "Counts the cycles when at least one uop is d= elivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles optimal number of Uops delivered by th= e LSD, but did not come from the decoder.", + "CounterMask": "6", + "EventCode": "0xa8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when optimal number of uop= s is delivered by the LSD (Loop-stream detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of Uops delivered by the LSD.", + "EventCode": "0xa8", + "EventName": "LSD.UOPS", + "PublicDescription": "Counts the number of uops delivered to the b= ack-end by the LSD(Loop Stream Detector).", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of machine clears (nukes) of any type.= ", + "CounterMask": "1", + "EdgeDetect": "1", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.COUNT", + "PublicDescription": "Counts the number of machine clears (nukes) = of any type.", + "SampleAfterValue": "100003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of machine clears due to me= mory ordering in which an internal load passes an older store within the sa= me CPU.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.DISAMBIGUATION", + "SampleAfterValue": "20003", + "UMask": "0x8", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of machine clears due to a = page fault. Counts both I-Side and D-Side (Loads/Stores) page faults. A p= age fault occurs when either the page is not present, or an access violatio= n occurs.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.PAGE_FAULT", + "SampleAfterValue": "20003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of machine clears that flus= h the pipeline and restart the machine with the use of microcode due to SMC= , MEMORY_ORDERING, FP_ASSISTS, PAGE_FAULT, DISAMBIGUATION, and FPC_VIRTUAL_= TRAP.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.SLOW", + "SampleAfterValue": "20003", + "UMask": "0x6f", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of machine clears due to pr= ogram modifying data (self modifying code) within 1K of a recently fetched = code page.", + "EventCode": "0xc3", + "EventName": "MACHINE_CLEARS.SMC", + "SampleAfterValue": "20003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts cycles where the pipeline is stalled d= ue to serializing operations.", + "EventCode": "0xa2", + "EventName": "RESOURCE_STALLS.SCOREBOARD", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "This event counts a subset of the Topdown Slo= ts event that were not consumed by the back-end pipeline due to lack of bac= k-end resources, as a result of memory subsystem delays, execution units li= mitations, or other conditions.", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", + "PublicDescription": "This event counts a subset of the Topdown Sl= ots event that were not consumed by the back-end pipeline due to lack of ba= ck-end resources, as a result of memory subsystem delays, execution units l= imitations, or other conditions.\nThe count is distributed among unhalted l= ogical processors (hyper-threads) who share the same physical core, in proc= essors that support Intel Hyper-Threading Technology. Software can use this= event as the numerator for the Backend Bound metric (or top-level category= ) of the Top-down Microarchitecture Analysis method.", + "SampleAfterValue": "10000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TMA slots wasted due to incorrect speculation= s.", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BAD_SPEC_SLOTS", + "PublicDescription": "Number of slots of TMA method that were wast= ed due to incorrect speculation. It covers all types of control-flow or dat= a-related mis-speculations.", + "SampleAfterValue": "10000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", + "EventCode": "0xa4", + "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", + "SampleAfterValue": "10000003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TOPDOWN.MEMORY_BOUND_SLOTS", + "EventCode": "0xa4", + "EventName": "TOPDOWN.MEMORY_BOUND_SLOTS", + "SampleAfterValue": "10000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. Fixed counter - architectural event", + "EventName": "TOPDOWN.SLOTS", + "PublicDescription": "Number of available slots for an unhalted lo= gical processor. The event increments by machine-width of the narrowest pip= eline as employed by the Top-down Microarchitecture Analysis method (TMA). = The count is distributed among unhalted logical processors (hyper-threads) = who share the same physical core. Software can use this event as the denomi= nator for the top-level metrics of the TMA method. This architectural event= is counted on a designated fixed counter (Fixed Counter 3).", + "SampleAfterValue": "10000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "TMA slots available for an unhalted logical p= rocessor. General counter - architectural event", + "EventCode": "0xa4", + "EventName": "TOPDOWN.SLOTS_P", + "PublicDescription": "Counts the number of available slots for an = unhalted logical processor. The event increments by machine-width of the na= rrowest pipeline as employed by the Top-down Microarchitecture Analysis met= hod. The count is distributed among unhalted logical processors (hyper-thre= ads) who share the same physical core.", + "SampleAfterValue": "10000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of issue slots that were no= t consumed by the backend because allocation is stalled due to a mispredict= ed jump or a machine clear.", "EventCode": "0x73", "EventName": "TOPDOWN_BAD_SPECULATION.ALL", "PublicDescription": "Counts the total number of issue slots that = were not consumed by the backend because allocation is stalled due to a mis= predicted jump or a machine clear. Only issue slots wasted due to fast nuke= s such as memory ordering nukes are counted. Other nukes are not accounted = for. Counts all issue slots blocked during this recovery window, including = relevant microcode flows, and while uops are not yet available in the instr= uction queue (IQ) or until an FE_BOUND event occurs besides OTHER and CISC.= Also includes the issue slots that were consumed by the backend but were t= hrown away because they were younger than the mispredict or machine clear.", "SampleAfterValue": "1000003", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not consumed by the backend due to Fast Nukes such as Memory Ord= ering Machine clears and MRN nukes", + "EventCode": "0x73", + "EventName": "TOPDOWN_BAD_SPECULATION.FASTNUKE", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the total number of issue slots that w= ere not consumed by the backend because allocation is stalled due to a mach= ine clear (nuke) of any kind including memory ordering and memory disambigu= ation.", + "EventCode": "0x73", + "EventName": "TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS", + "SampleAfterValue": "1000003", + "UMask": "0x3", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not consumed by the backend due to a machine clear (nuke).", + "EventCode": "0x73", + "EventName": "TOPDOWN_BAD_SPECULATION.NUKE", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of retirement slots not con= sumed due to backend stalls", "EventCode": "0x74", @@ -187,6 +922,30 @@ "SampleAfterValue": "1000003", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not consumed by the backend due to due to certain allocation rest= rictions", + "EventCode": "0x74", + "EventName": "TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not consumed by the backend due to mrbl stall. A 'marble' refers= to a physical register file entry, also known as the physical destination = (PDST).", + "EventCode": "0x74", + "EventName": "TOPDOWN_BE_BOUND.REGISTER", + "SampleAfterValue": "1000003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not consumed by the backend due to iq/jeu scoreboards or ms scb", + "EventCode": "0x74", + "EventName": "TOPDOWN_BE_BOUND.SERIALIZATION", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of retirement slots not con= sumed due to front end stalls", "EventCode": "0x71", @@ -194,6 +953,79 @@ "SampleAfterValue": "1000003", "Unit": "cpu_atom" }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to BAClear", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.BRANCH_DETECT", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to BTClear", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.BRANCH_RESTEER", + "SampleAfterValue": "1000003", + "UMask": "0x40", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to ms", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.CISC", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to decode stall", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.DECODE", + "SampleAfterValue": "1000003", + "UMask": "0x8", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to frontend bandwidth restricti= ons due to decode, predecode, cisc, and other limitations.", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH", + "SampleAfterValue": "1000003", + "UMask": "0x8d", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to latency related stalls inclu= ding BACLEARs, BTCLEARs, ITLB misses, and ICache misses.", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.FRONTEND_LATENCY", + "SampleAfterValue": "1000003", + "UMask": "0x72", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "This event is deprecated. [This event is alia= s to TOPDOWN_FE_BOUND.ITLB_MISS]", + "Deprecated": "1", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.ITLB", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to itlb miss [This event is ali= as to TOPDOWN_FE_BOUND.ITLB]", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.ITLB_MISS", + "SampleAfterValue": "1000003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of issue slots every cycle = that were not delivered by the frontend due to predecode wrong", + "EventCode": "0x71", + "EventName": "TOPDOWN_FE_BOUND.PREDECODE", + "SampleAfterValue": "1000003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, { "BriefDescription": "Counts the number of consumed retirement slot= s. Similar to UOPS_RETIRED.ALL", "EventCode": "0x72", @@ -202,6 +1034,269 @@ "SampleAfterValue": "1000003", "Unit": "cpu_atom" }, + { + "BriefDescription": "Number of non dec-by-all uops decoded by deco= der", + "EventCode": "0x76", + "EventName": "UOPS_DECODED.DEC0_UOPS", + "PublicDescription": "This event counts the number of not dec-by-a= ll uops decoded by decoder 0.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on port 0", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_0", + "PublicDescription": "Number of uops dispatch to execution port 0= .", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on port 1", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_1", + "PublicDescription": "Number of uops dispatch to execution port 1= .", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on ports 2, 3 and 10", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_2_3_10", + "PublicDescription": "Number of uops dispatch to execution ports 2= , 3 and 10", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on ports 4 and 9", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_4_9", + "PublicDescription": "Number of uops dispatch to execution ports 4= and 9", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on ports 5 and 11", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_5_11", + "PublicDescription": "Number of uops dispatch to execution ports 5= and 11", + "SampleAfterValue": "2000003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on port 6", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_6", + "PublicDescription": "Number of uops dispatch to execution port 6= .", + "SampleAfterValue": "2000003", + "UMask": "0x40", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Uops executed on ports 7 and 8", + "EventCode": "0xb2", + "EventName": "UOPS_DISPATCHED.PORT_7_8", + "PublicDescription": "Number of uops dispatch to execution ports = 7 and 8.", + "SampleAfterValue": "2000003", + "UMask": "0x80", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Number of uops executed on the core.", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CORE", + "PublicDescription": "Counts the number of uops executed from any = thread.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles at least 1 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "1", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", + "PublicDescription": "Counts cycles when at least 1 micro-op is ex= ecuted from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles at least 2 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "2", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", + "PublicDescription": "Counts cycles when at least 2 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles at least 3 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "3", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", + "PublicDescription": "Counts cycles when at least 3 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles at least 4 micro-op is executed from a= ny thread on physical core.", + "CounterMask": "4", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", + "PublicDescription": "Counts cycles when at least 4 micro-ops are = executed from any thread on physical core.", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where at least 1 uop was executed per-= thread", + "CounterMask": "1", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_1", + "PublicDescription": "Cycles where at least 1 uop was executed per= -thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where at least 2 uops were executed pe= r-thread", + "CounterMask": "2", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_2", + "PublicDescription": "Cycles where at least 2 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where at least 3 uops were executed pe= r-thread", + "CounterMask": "3", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_3", + "PublicDescription": "Cycles where at least 3 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles where at least 4 uops were executed pe= r-thread", + "CounterMask": "4", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.CYCLES_GE_4", + "PublicDescription": "Cycles where at least 4 uops were executed p= er-thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts number of cycles no uops were dispatch= ed to be executed on this thread.", + "CounterMask": "1", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.STALLS", + "Invert": "1", + "PublicDescription": "Counts cycles during which no uops were disp= atched from the Reservation Station (RS) per thread.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of uops to be executed per-= thread each cycle.", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.THREAD", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of x87 uops dispatched.", + "EventCode": "0xb1", + "EventName": "UOPS_EXECUTED.X87", + "PublicDescription": "Counts the number of x87 uops executed.", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of uops issued by the front= end every cycle.", + "EventCode": "0x0e", + "EventName": "UOPS_ISSUED.ANY", + "PublicDescription": "Counts the number of uops issued by the fron= t end every cycle. When 4-uops are requested and only 2-uops are delivered,= the event counts 2. Uops_issued correlates to the number of ROB entries. = If uop takes 2 ROB slots it counts as 2 uops_issued.", + "SampleAfterValue": "1000003", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Uops that RAT issues to RS", + "EventCode": "0xae", + "EventName": "UOPS_ISSUED.ANY", + "PublicDescription": "Counts the number of uops that the Resource = Allocation Table (RAT) issues to the Reservation Station (RS).", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "UOPS_ISSUED.CYCLES", + "CounterMask": "1", + "EventCode": "0xae", + "EventName": "UOPS_ISSUED.CYCLES", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when RAT does not issue Uops to RS for= the thread", + "CounterMask": "1", + "EventCode": "0xae", + "EventName": "UOPS_ISSUED.STALLS", + "Invert": "1", + "PublicDescription": "Counts cycles during which the Resource Allo= cation Table (RAT) does not issue any Uops to the reservation station (RS) = for the current thread.", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Retired uops except the last uop of each inst= ruction.", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.HEAVY", + "PublicDescription": "Counts the number of retired micro-operation= s (uops) except the last uop of each instruction. An instruction that is de= coded into less than two uops does not contribute to the count.", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of integer divide uops reti= red.", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.IDIV", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of uops that are from the c= omplex flows issued by the micro-sequencer (MS). This includes uops from f= lows due to complex instructions, faults, assists, and inserted flows.", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.MS", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "UOPS_RETIRED.MS", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.MS", + "MSRIndex": "0x3F7", + "MSRValue": "0x8", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_core" + }, { "BriefDescription": "This event counts a subset of the Topdown Slo= ts event that are utilized by operations that eventually get retired (commi= tted) by the processor pipeline. Usually, this event positively correlates = with higher performance for example, as measured by the instructions-per-c= ycle metric.", "EventCode": "0xc2", @@ -210,5 +1305,25 @@ "SampleAfterValue": "2000003", "UMask": "0x2", "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles with less than 10 actually retired uop= s.", + "CounterMask": "10", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.TOTAL_CYCLES", + "Invert": "1", + "PublicDescription": "Counts the number of cycles using always tru= e condition (uops_ret < 16) applied to non PEBS uops retired event.", + "SampleAfterValue": "1000003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of x87 uops retired, includ= es those in ms flows", + "EventCode": "0xc2", + "EventName": "UOPS_RETIRED.X87", + "PEBS": "1", + "SampleAfterValue": "2000003", + "UMask": "0x2", + "Unit": "cpu_atom" } ] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/uncore-cache.json b/= tools/perf/pmu-events/arch/x86/meteorlake/uncore-cache.json new file mode 100644 index 000000000000..188843be4caf --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-cache.json @@ -0,0 +1,18 @@ +[ + { + "BriefDescription": "Number of all entries allocated. Includes als= o retries.", + "EventCode": "0x35", + "EventName": "UNC_HAC_CBO_TOR_ALLOCATION.ALL", + "PerPkg": "1", + "UMask": "0x8", + "Unit": "HAC_CBO" + }, + { + "BriefDescription": "Asserted on coherent DRD + DRdPref allocatio= ns into the queue. Cacheable only", + "EventCode": "0x35", + "EventName": "UNC_HAC_CBO_TOR_ALLOCATION.DRD", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "HAC_CBO" + } +] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/uncore-interconnect.= json b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-interconnect.json new file mode 100644 index 000000000000..08b5c7574cfc --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-interconnect.json @@ -0,0 +1,42 @@ +[ + { + "BriefDescription": "Number of all coherent Data Read entries. Doe= sn't include prefetches", + "EventCode": "0x81", + "EventName": "UNC_HAC_ARB_REQ_TRK_REQUEST.DRD", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "HAC_ARB" + }, + { + "BriefDescription": "Number of all CMI transactions", + "EventCode": "0x8a", + "EventName": "UNC_HAC_ARB_TRANSACTIONS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "HAC_ARB" + }, + { + "BriefDescription": "Number of all CMI reads", + "EventCode": "0x8a", + "EventName": "UNC_HAC_ARB_TRANSACTIONS.READS", + "PerPkg": "1", + "UMask": "0x2", + "Unit": "HAC_ARB" + }, + { + "BriefDescription": "Number of all CMI writes not including Mflush= ", + "EventCode": "0x8a", + "EventName": "UNC_HAC_ARB_TRANSACTIONS.WRITES", + "PerPkg": "1", + "UMask": "0x4", + "Unit": "HAC_ARB" + }, + { + "BriefDescription": "Total number of all outgoing entries allocate= d. Accounts for Coherent and non-coherent traffic.", + "EventCode": "0x81", + "EventName": "UNC_HAC_ARB_TRK_REQUESTS.ALL", + "PerPkg": "1", + "UMask": "0x1", + "Unit": "HAC_ARB" + } +] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/uncore-memory.json b= /tools/perf/pmu-events/arch/x86/meteorlake/uncore-memory.json new file mode 100644 index 000000000000..c9d248d1042e --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-memory.json @@ -0,0 +1,126 @@ +[ + { + "BriefDescription": "Counts every CAS read command sent from the M= emory Controller 0 to DRAM (sum of all channels).", + "EventCode": "0xff", + "EventName": "UNC_MC0_RDCAS_COUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every CAS read command sent from the = Memory Controller 0 to DRAM (sum of all channels). Each CAS commands can be= for 32B or 64B of data.", + "UMask": "0x20", + "Unit": "imc_free_running_0" + }, + { + "BriefDescription": "Counts every read and write request entering = the Memory Controller 0.", + "EventCode": "0xff", + "EventName": "UNC_MC0_TOTAL_REQCOUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every read and write request entering= the Memory Controller 0 (sum of all channels). All requests are counted as= one, whether they are 32B or 64B Read/Write or partial/full line writes. S= ome write requests to the same address may merge to a single write command = to DRAM. Therefore, the total request count may be higher than total DRAM B= W.", + "UMask": "0x10", + "Unit": "imc_free_running_0" + }, + { + "BriefDescription": "Counts every CAS write command sent from the = Memory Controller 0 to DRAM (sum of all channels).", + "EventCode": "0xff", + "EventName": "UNC_MC0_WRCAS_COUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every CAS write command sent from the= Memory Controller 0 to DRAM (sum of all channels). Each CAS commands can = be for 32B or 64B of data.", + "UMask": "0x30", + "Unit": "imc_free_running_0" + }, + { + "BriefDescription": "Counts every CAS read command sent from the M= emory Controller 1 to DRAM (sum of all channels).", + "EventCode": "0xff", + "EventName": "UNC_MC1_RDCAS_COUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every CAS read command sent from the = Memory Controller 1 to DRAM (sum of all channels). Each CAS commands can be= for 32B or 64B of data.", + "UMask": "0x20", + "Unit": "imc_free_running_1" + }, + { + "BriefDescription": "Counts every read and write request entering = the Memory Controller 1.", + "EventCode": "0xff", + "EventName": "UNC_MC1_TOTAL_REQCOUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every read and write request entering= the Memory Controller 1 (sum of all channels). All requests are counted as= one, whether they are 32B or 64B Read/Write or partial/full line writes. S= ome write requests to the same address may merge to a single write command = to DRAM. Therefore, the total request count may be higher than total DRAM B= W.", + "UMask": "0x10", + "Unit": "imc_free_running_1" + }, + { + "BriefDescription": "Counts every CAS write command sent from the = Memory Controller 1 to DRAM (sum of all channels).", + "EventCode": "0xff", + "EventName": "UNC_MC1_WRCAS_COUNT_FREERUN", + "PerPkg": "1", + "PublicDescription": "Counts every CAS write command sent from the= Memory Controller 1 to DRAM (sum of all channels). Each CAS commands can = be for 32B or 64B of data.", + "UMask": "0x30", + "Unit": "imc_free_running_1" + }, + { + "BriefDescription": "ACT command for a read request sent to DRAM", + "EventCode": "0x24", + "EventName": "UNC_M_ACT_COUNT_RD", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "ACT command sent to DRAM", + "EventCode": "0x26", + "EventName": "UNC_M_ACT_COUNT_TOTAL", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "ACT command for a write request sent to DRAM", + "EventCode": "0x25", + "EventName": "UNC_M_ACT_COUNT_WR", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "Read CAS command sent to DRAM", + "EventCode": "0x22", + "EventName": "UNC_M_CAS_COUNT_RD", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "Write CAS command sent to DRAM", + "EventCode": "0x23", + "EventName": "UNC_M_CAS_COUNT_WR", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "PRE command sent to DRAM due to page table id= le timer expiration", + "EventCode": "0x28", + "EventName": "UNC_M_PRE_COUNT_IDLE", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "PRE command sent to DRAM for a read/write req= uest", + "EventCode": "0x27", + "EventName": "UNC_M_PRE_COUNT_PAGE_MISS", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of bytes read from DRAM, in 32B chunks= . Counter increments by 1 after receiving 32B chunk data.", + "EventCode": "0x3A", + "EventName": "UNC_M_RD_DATA", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "Total number of read and write byte transfers= to/from DRAM, in 32B chunks. Counter increments by 1 after sending or rece= iving 32B chunk data.", + "EventCode": "0x3C", + "EventName": "UNC_M_TOTAL_DATA", + "PerPkg": "1", + "Unit": "iMC" + }, + { + "BriefDescription": "Number of bytes written to DRAM, in 32B chunk= s. Counter increments by 1 after sending 32B chunk data.", + "EventCode": "0x3B", + "EventName": "UNC_M_WR_DATA", + "PerPkg": "1", + "Unit": "iMC" + } +] diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json = b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json index 556e4292fcc8..056c2a885a32 100644 --- a/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json +++ b/tools/perf/pmu-events/arch/x86/meteorlake/virtual-memory.json @@ -1,4 +1,39 @@ [ + { + "BriefDescription": "Counts the number of first level TLB misses b= ut second level hits due to a demand load that did not start a page walk. A= ccounts for all page sizes. Will result in a DTLB write from STLB.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.STLB_HIT", + "SampleAfterValue": "200003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Loads that miss the DTLB and hit the STLB.", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.STLB_HIT", + "PublicDescription": "Counts loads that miss the DTLB (Data TLB) a= nd hit the STLB (Second level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a demand load.", + "CounterMask": "1", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a demand load.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks completed due= to load DTLB misses.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", + "SampleAfterValue": "200003", + "UMask": "0xe", + "Unit": "cpu_atom" + }, { "BriefDescription": "Load miss in all TLB levels causes a page wal= k that completes. (All page sizes)", "EventCode": "0x12", @@ -8,6 +43,95 @@ "UMask": "0xe", "Unit": "cpu_core" }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 1G page.", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_1G", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks completed due= to load DTLB misses to a 2M or 4M page.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts the number of page walks completed du= e to loads (including SW prefetches) whose address translations missed in a= ll Translation Lookaside Buffer (TLB) levels and were mapped to 2M or 4M pa= ges. Includes page walks that page fault.", + "SampleAfterValue": "200003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 2M/4M page.", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data loads. This implies address translations missed in the= DTLB and further levels of TLB. The page walk can end with or without a fa= ult.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Page walks completed due to a demand data loa= d to a 4K page.", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data loads. This implies address translations missed in the DT= LB and further levels of TLB. The page walk can end with or without a fault= .", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks outstanding f= or Loads (demand or SW prefetch) in PMH every cycle.", + "EventCode": "0x08", + "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for Loads (demand or SW prefetch) in PMH every cycle. A PMH page walk is o= utstanding from page walk start till PMH becomes idle again (ready to serve= next walk). Includes EPT-walk intervals.", + "SampleAfterValue": "200003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Number of page walks outstanding for a demand= load in the PMH each cycle.", + "EventCode": "0x12", + "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a demand load in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of first level TLB misses b= ut second level hits due to stores that did not start a page walk. Accounts= for all pages sizes. Will result in a DTLB write from STLB.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.STLB_HIT", + "SampleAfterValue": "2000003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Stores that miss the DTLB and hit the STLB.", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.STLB_HIT", + "PublicDescription": "Counts stores that miss the DTLB (Data TLB) = and hit the STLB (2nd Level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for a store.", + "CounterMask": "1", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a store.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks completed due= to store DTLB misses to a 1G page.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", + "SampleAfterValue": "2000003", + "UMask": "0xe", + "Unit": "cpu_atom" + }, { "BriefDescription": "Store misses in all TLB levels causes a page = walk that completes. (All page sizes)", "EventCode": "0x13", @@ -17,6 +141,86 @@ "UMask": "0xe", "Unit": "cpu_core" }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 1G page.", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_1G", + "PublicDescription": "Counts completed page walks (1G sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", + "SampleAfterValue": "100003", + "UMask": "0x8", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 2M/4M page.", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M sizes) c= aused by demand data stores. This implies address translations missed in th= e DTLB and further levels of TLB. The page walk can end with or without a f= ault.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Page walks completed due to a demand data sto= re to a 4K page.", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K sizes) caus= ed by demand data stores. This implies address translations missed in the D= TLB and further levels of TLB. The page walk can end with or without a faul= t.", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks outstanding i= n the page miss handler (PMH) for stores every cycle.", + "EventCode": "0x49", + "EventName": "DTLB_STORE_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = in the page miss handler (PMH) for stores every cycle. A PMH page walk is o= utstanding from page walk start till PMH becomes idle again (ready to serve= next walk). Includes EPT-walk intervals.", + "SampleAfterValue": "200003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Number of page walks outstanding for a store = in the PMH each cycle.", + "EventCode": "0x13", + "EventName": "DTLB_STORE_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for a store in the PMH (Page Miss Handler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks initiated by = a instruction fetch that missed the first and second level TLBs.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.MISS_CAUSED_WALK", + "SampleAfterValue": "1000003", + "UMask": "0x1", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Counts the number of first level TLB misses b= ut second level hits due to an instruction fetch that did not start a page = walk. Account for all pages sizes. Will result in an ITLB write from STLB.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.STLB_HIT", + "SampleAfterValue": "2000003", + "UMask": "0x20", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Instruction fetch requests that miss the ITLB= and hit the STLB.", + "EventCode": "0x11", + "EventName": "ITLB_MISSES.STLB_HIT", + "PublicDescription": "Counts instruction fetch requests that miss = the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", + "SampleAfterValue": "100003", + "UMask": "0x20", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Cycles when at least one PMH is busy with a p= age walk for code (instruction fetch) request.", + "CounterMask": "1", + "EventCode": "0x11", + "EventName": "ITLB_MISSES.WALK_ACTIVE", + "PublicDescription": "Counts cycles when at least one PMH (Page Mi= ss Handler) is busy with a page walk for a code (instruction fetch) request= .", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, { "BriefDescription": "Counts the number of page walks completed due= to instruction fetch misses to any page size.", "EventCode": "0x85", @@ -34,5 +238,58 @@ "SampleAfterValue": "100003", "UMask": "0xe", "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks completed due= to instruction fetch misses to a 2M or 4M page.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts the number of page walks completed du= e to instruction fetches whose address translations missed in all Translati= on Lookaside Buffer (TLB) levels and were mapped to 2M or 4M pages. Includ= es page walks that page fault.", + "SampleAfterValue": "2000003", + "UMask": "0x4", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (2M/4M)", + "EventCode": "0x11", + "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", + "PublicDescription": "Counts completed page walks (2M/4M page size= s) caused by a code fetch. This implies it missed in the ITLB (Instruction = TLB) and further levels of TLB. The page walk can end with or without a fau= lt.", + "SampleAfterValue": "100003", + "UMask": "0x4", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Code miss in all TLB levels causes a page wal= k that completes. (4K)", + "EventCode": "0x11", + "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", + "PublicDescription": "Counts completed page walks (4K page sizes) = caused by a code fetch. This implies it missed in the ITLB (Instruction TLB= ) and further levels of TLB. The page walk can end with or without a fault.= ", + "SampleAfterValue": "100003", + "UMask": "0x2", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of page walks outstanding f= or iside in PMH every cycle.", + "EventCode": "0x85", + "EventName": "ITLB_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for iside in PMH every cycle. A PMH page walk is outstanding from page wal= k start till PMH becomes idle again (ready to serve next walk). Includes EP= T-walk intervals. Walks could be counted by edge detecting on this event, = but would count restarted suspended walks.", + "SampleAfterValue": "200003", + "UMask": "0x10", + "Unit": "cpu_atom" + }, + { + "BriefDescription": "Number of page walks outstanding for an outst= anding code request in the PMH each cycle.", + "EventCode": "0x11", + "EventName": "ITLB_MISSES.WALK_PENDING", + "PublicDescription": "Counts the number of page walks outstanding = for an outstanding code (instruction fetch) request in the PMH (Page Miss H= andler) each cycle.", + "SampleAfterValue": "100003", + "UMask": "0x10", + "Unit": "cpu_core" + }, + { + "BriefDescription": "Counts the number of cycles that the head (ol= dest load) of the load buffer and retirement are both stalled due to a DTLB= miss.", + "EventCode": "0x05", + "EventName": "LD_HEAD.DTLB_MISS_AT_RET", + "SampleAfterValue": "1000003", + "UMask": "0x90", + "Unit": "cpu_atom" } ] --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA5FEEB64DD for ; Fri, 23 Jun 2023 15:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230264AbjFWPMS (ORCPT ); Fri, 23 Jun 2023 11:12:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232226AbjFWPLZ (ORCPT ); Fri, 23 Jun 2023 11:11:25 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B51A330C3 for ; Fri, 23 Jun 2023 08:10:59 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-573d70da28fso10255997b3.3 for ; Fri, 23 Jun 2023 08:10:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533048; x=1690125048; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=UkFMXyFQYP5SYg4k6mH76+89v0Urrn2j4UzJ4DERFYE=; b=7P9TbTJxnV7nPug26EbdR2c1kqIjFWqev+SyeDgTQKed4YFHKN2La07obRSijhUNBO MH5SxhrYSUv4QwSqK8p/6oz7X9mezGX7UC28qaLIII1qqexEpVCxWii6245VFC3gn1bi sPUUFJa1ttootg1OPQ1mU0BvPE6yV+TP3iy2Ie1vygeyCv9AX+d88TKoxl8bVPGjaL3D Ts5JrzQXjIGrD+d0bNcfijycpDzYCTiy+iyDi6hNC9q/LZmTmkoCa/B+zoySdwwEV0Og o9ouco9nWhHQ6ubKNh6xFiru/KQnya4X2xwfCQ+ydMHOE+1xktEQZFdmAHNIzMoY/2XZ Zksg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533048; x=1690125048; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=UkFMXyFQYP5SYg4k6mH76+89v0Urrn2j4UzJ4DERFYE=; b=f8947xGN1ELdbra06ezJGFh4pPbSt0uxaqDq1UhCHhie/IPNsAKqZG94QhBi33P0qQ iVBYuJa8sOvexuAOJKSil0FfpPSD2UUCfyl7QyjOLuXQFNSQGzN/YjR9+qopXRwWW94g 4Pm8pwmpToUp6mQPaWwW13ru/BEEAqmg7bh+0017CLfw2Z1TPK/o+P+q6U7YgfkkxamY Y/ZXKlfUB/ST9REPTBPXDZID1+H6usPiNTuX05/R1faeBzfcySOnTWM4BEoQRrSHoPQF re/PXZXXMmgAOBQ6hSyk+ibajIiH4lQBRelx7yhf+eIsbZ8Gh23dHR0zzrO/FlGPZpU0 CCSg== X-Gm-Message-State: AC+VfDxIGYoPsVpZpm/BqfARb49Cw5DrgqKW4UTIrvJo5XXDW6U4WY91 wSI25vwdhGl0Yr4IH62AjsUK/5N6f70R X-Google-Smtp-Source: ACHHUZ6fAosckhvl/sCwVu1oG07WY3GYv//Lm2sKsAzedhRrHI0laIPOqQRUr2iYtpHTBnjnSIhocJfOBHVt X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a25:694c:0:b0:bc1:dfa5:83a8 with SMTP id e73-20020a25694c000000b00bc1dfa583a8mr9580763ybc.10.1687533048557; Fri, 23 Jun 2023 08:10:48 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:10 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-7-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 06/12] perf vendor events intel: Update cascadelakex to 1.19 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/e4f83534f23a69e6da55c672c4d92991968= 8c9b6 Adds the events IDQ.DSB_CYCLES_OK, IDQ.DSB_CYCLES_ANY, ICACHE_TAG.STALLS, DECODE.LCP, LSD.CYCLES_OK. Descriptions are also updated. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- .../arch/x86/cascadelakex/frontend.json | 43 ++++++++++++++++--- .../arch/x86/cascadelakex/pipeline.json | 17 ++++++-- .../x86/cascadelakex/uncore-interconnect.json | 2 +- .../arch/x86/cascadelakex/uncore-memory.json | 2 +- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 5 files changed, 54 insertions(+), 12 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/frontend.json b/to= ols/perf/pmu-events/arch/x86/cascadelakex/frontend.json index 04f08e4d2402..095904c77001 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/frontend.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches", "EventCode": "0xAB", @@ -245,27 +253,34 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", "SampleAfterValue": "200003", "UMask": "0x4" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]", "CounterMask": "4", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS", - "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.DSB_CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x18" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]", "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.DSB_CYCLES_ANY]", "SampleAfterValue": "2000003", "UMask": "0x18" }, @@ -296,6 +311,24 @@ "SampleAfterValue": "2000003", "UMask": "0x8" }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, { "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", "EventCode": "0x79", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json b/to= ols/perf/pmu-events/arch/x86/cascadelakex/pipeline.json index 31a1663d57f8..66d686cc933e 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/pipeline.json @@ -361,10 +361,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -488,11 +488,11 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder.", + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_OK]", "CounterMask": "4", "EventCode": "0xA8", "EventName": "LSD.CYCLES_4_UOPS", - "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector).", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -505,6 +505,15 @@ "SampleAfterValue": "2000003", "UMask": "0x1" }, + { + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0xA8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOP= S]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Number of Uops delivered by the LSD.", "EventCode": "0xA8", diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnec= t.json b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.js= on index 725780fb3990..1a342dff1503 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-interconnect.json @@ -6606,7 +6606,7 @@ "EventCode": "0x52", "EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS", "PerPkg": "1", - "PublicDescription": "ad and bl messages were actually slotted int= o the same flit in paralle", + "PublicDescription": "ad and bl messages were actually slotted int= o the same flit in parallel", "UMask": "0x8", "Unit": "M3UPI" }, diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json= b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json index f761856d738e..d82d2cca6f0a 100644 --- a/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/uncore-memory.json @@ -2735,7 +2735,7 @@ "EventCode": "0x81", "EventName": "UNC_M_WPQ_OCCUPANCY", "PerPkg": "1", - "PublicDescription": "Counts the number of entries in the Write Pe= nding Queue (WPQ) at each cycle. This can then be used to calculate both t= he average queue occupancy (in conjunction with the number of cycles not em= pty) and the average latency (in conjunction with the number of allocations= ). The WPQ is used to schedule writes out to the memory controller and to = track the requests. Requests allocate into the WPQ soon after they enter t= he memory controller, and need credits for an entry in this buffer before b= eing sent from the CHA to the iMC (memory controller). They deallocate aft= er being issued to DRAM. Write requests themselves are able to complete (f= rom the perspective of the rest of the system) as soon they have 'posted' t= o the iMC. This is not to be confused with actually performing the write t= o DRAM. Therefore, the average latency for this queue is actually not usef= ul for deconstruction intermediate write latencies. So, we provide filteri= ng based on if the request has posted or not. By using the 'not posted' fi= lter, we can track how long writes spent in the iMC before completions were= sent to the HA. The 'posted' filter, on the other hand, provides informat= ion about how much queueing is actually happening in the iMC for writes bef= ore they are actually issued to memory. High average occupancies will gene= rally coincide with high write major mode counts. Is there a filter of sort= s?", + "PublicDescription": "Counts the number of entries in the Write Pe= nding Queue (WPQ) at each cycle. This can then be used to calculate both t= he average queue occupancy (in conjunction with the number of cycles not em= pty) and the average latency (in conjunction with the number of allocations= ). The WPQ is used to schedule writes out to the memory controller and to = track the requests. Requests allocate into the WPQ soon after they enter t= he memory controller, and need credits for an entry in this buffer before b= eing sent from the CHA to the iMC (memory controller). They deallocate aft= er being issued to DRAM. Write requests themselves are able to complete (f= rom the perspective of the rest of the system) as soon they have 'posted' t= o the iMC. This is not to be confused with actually performing the write t= o DRAM. Therefore, the average latency for this queue is actually not usef= ul for deconstruction intermediate write latencies. So, we provide filteri= ng based on if the request has posted or not. By using the 'not posted' fi= lter, we can track how long writes spent in the iMC before completions were= sent to the HA. The 'posted' filter, on the other hand, provides informat= ion about how much queueing is actually happening in the iMC for writes bef= ore they are actually issued to memory. High average occupancies will gene= rally coincide with high write major mode counts.", "Unit": "iMC" }, { diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index de4832bddf05..eccc7ef98870 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -5,7 +5,7 @@ GenuineIntel-6-(1C|26|27|35|36),v4,bonnell,core GenuineIntel-6-(3D|47),v28,broadwell,core GenuineIntel-6-56,v10,broadwellde,core GenuineIntel-6-4F,v21,broadwellx,core -GenuineIntel-6-55-[56789ABCDEF],v1.18,cascadelakex,core +GenuineIntel-6-55-[56789ABCDEF],v1.19,cascadelakex,core GenuineIntel-6-9[6C],v1.04,elkhartlake,core GenuineIntel-6-5[CF],v13,goldmont,core GenuineIntel-6-7A,v1.01,goldmontplus,core --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7072CEB64DD for ; Fri, 23 Jun 2023 15:12:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbjFWPMX (ORCPT ); Fri, 23 Jun 2023 11:12:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232141AbjFWPL0 (ORCPT ); Fri, 23 Jun 2023 11:11:26 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EC3C30C4 for ; Fri, 23 Jun 2023 08:11:00 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-25eb3db302eso558162a91.3 for ; Fri, 23 Jun 2023 08:11:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533051; x=1690125051; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=PpmMqADQMWGgQwCwK+rLV+M/fmB9R42OKi4G7A8TKP0=; b=cTyv4QNQw/qNRq6i0Y5jnAZNAjaifK4IhbAlHvv/LNO7ugdljWn7ivYaFae1ucBXyr ydE9pg5VmWsTuPP7US4dywtxEQYjO6WUf6YjGUj4QgVxgy0htavegZUSytR2FJTy8/9a opFdCsOX9xEzD1AgcCGxAnh4yk5pW9vrmmsu1saXXYhuZH+7n3/AjtzdmYDlSVl7iZs6 WqBUCr8mb1NeGN/Hj7Urml4kQ0Tiyq3JI95vormYSOWZ3NaspBBNEKLBkecoFgeKsKjU 2bwZ6ALZC7PyVRqq88N5+cpAVL/ib6UG0EZkG1CqJBEnNduNMFvEHAhD33FOiOWJPaXA Nl7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533051; x=1690125051; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=PpmMqADQMWGgQwCwK+rLV+M/fmB9R42OKi4G7A8TKP0=; b=isfH6grlFsD8gkYGnnq0ngxlc/OusWsoZcXOcJgp5P3aR5vHX8PXihyiypTslJeKnZ 4Jy118c9pzhnWe2Xgvmn6uBHLFglluDbpR57ISHNFu+Y86dFsYnHAzMy8b0G9Ia8qIp7 Q+00yGOuF9VgWAfilZUXi2kV/q33Pp732MOa3mNq0RgKrv/y83pHOurLz/2Hh5opGfh0 e0BLuhfwCyUPGVJ+QOfz+M9o4i5sAYcMijQ8aAjMaKTuF6JKjV65gMCpb9M2LoLO2lRV WW57aesCtfOrZxjRMQ0yCWgYB3Y4HmhH3uRMN9EKjnklHDow+Sgs1LbobPs7YhKoRWfH cqWw== X-Gm-Message-State: AC+VfDwnyvYL2Utm6gCLTCJtbn2k5ygK9SipNbC2ps1ViLGQ3L1exV5A JLhKHsPC4CWdDuRIScOvbNOBuAU5ahD4 X-Google-Smtp-Source: ACHHUZ5X62eFvERwa+Q4pq3p2hb7vhFBLQBjOE3eWsvjoKiPTJyBEr+GISvtpyfkkmiS7v4D0N8G3sLrtXhY X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a17:90a:cb06:b0:25b:f973:f944 with SMTP id z6-20020a17090acb0600b0025bf973f944mr2967981pjt.3.1687533051089; Fri, 23 Jun 2023 08:10:51 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:11 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-8-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 07/12] perf vendor events intel: Update icelake to 1.19 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/f3d841189f8964bc240c86301f4c8498456= 30b5b A number of events are deprecated and event descriptions updated. Adds events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and DECODE.LCP. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- .../pmu-events/arch/x86/icelake/cache.json | 8 ++--- .../pmu-events/arch/x86/icelake/frontend.json | 32 ++++++++++++++++--- .../pmu-events/arch/x86/icelake/pipeline.json | 6 ++-- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 4 files changed, 36 insertions(+), 12 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/icelake/cache.json b/tools/perf= /pmu-events/arch/x86/icelake/cache.json index 79b9f02a4b63..d26c4efe35f0 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/cache.json +++ b/tools/perf/pmu-events/arch/x86/icelake/cache.json @@ -155,18 +155,18 @@ "UMask": "0x21" }, { - "BriefDescription": "All requests that miss L2 cache. This event i= s not supported on ICL and ICX products, only supported on RKL products.", + "BriefDescription": "This event is deprecated.", + "Deprecated": "1", "EventCode": "0x24", "EventName": "L2_RQSTS.MISS", - "PublicDescription": "Counts all requests that miss L2 cache. This= event is not supported on ICL and ICX products, only supported on RKL prod= ucts.", "SampleAfterValue": "200003", "UMask": "0x3f" }, { - "BriefDescription": "All L2 requests. This event is not supported = on ICL and ICX products, only supported on RKL products.", + "BriefDescription": "This event is deprecated.", + "Deprecated": "1", "EventCode": "0x24", "EventName": "L2_RQSTS.REFERENCES", - "PublicDescription": "Counts all L2 requests. This event is not su= pported on ICL and ICX products, only supported on RKL products.", "SampleAfterValue": "200003", "UMask": "0xff" }, diff --git a/tools/perf/pmu-events/arch/x86/icelake/frontend.json b/tools/p= erf/pmu-events/arch/x86/icelake/frontend.json index 3e3d2b002170..2b539a08d2bf 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/icelake/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transition= s count.", "CounterMask": "1", @@ -213,10 +221,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", "EventCode": "0x80", "EventName": "ICACHE_16B.IFDATA_STALL", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity.", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -237,10 +245,26 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.ST= ALLS]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IF= TAG_STALL]", "SampleAfterValue": "200003", "UMask": "0x4" }, diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/p= erf/pmu-events/arch/x86/icelake/pipeline.json index 154fee4b60fb..375b78044f14 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json @@ -318,10 +318,10 @@ "UMask": "0x40" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -556,7 +556,7 @@ "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the specualtive path a= s well as the out-of-order engine recovery past a branch misprediction.", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the speculative path a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index eccc7ef98870..d63c9df8f65d 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -13,7 +13,7 @@ GenuineIntel-6-B6,v1.00,grandridge,core GenuineIntel-6-A[DE],v1.01,graniterapids,core GenuineIntel-6-(3C|45|46),v33,haswell,core GenuineIntel-6-3F,v27,haswellx,core -GenuineIntel-6-7[DE],v1.18,icelake,core +GenuineIntel-6-7[DE],v1.19,icelake,core GenuineIntel-6-6[AC],v1.20,icelakex,core GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D732EB64DD for ; Fri, 23 Jun 2023 15:13:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231848AbjFWPND (ORCPT ); Fri, 23 Jun 2023 11:13:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232448AbjFWPL2 (ORCPT ); Fri, 23 Jun 2023 11:11:28 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDB2F30D3 for ; Fri, 23 Jun 2023 08:11:03 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-be47a3b3a01so982612276.2 for ; Fri, 23 Jun 2023 08:11:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533054; x=1690125054; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=IH93V5eIpnR2f14b4ZM+Ftn03PloFxrPP2IsbOx78io=; b=6tkCJbVnTb3RbhEhWfNrcbBleyTdLKcoH8NxrgI6w7nekLZmENgfS3c5q9jhBBHuTu iOTX6pePyvOtTYb8HD1K+j8YBNuqGfguWzNkUweGqAzyYeEqpCzHkT8Y8PX+EgazWAYk HaMTfKkFuo+60ken5pcC6J7OHVsUf6EAkWDjhoEZL106YiQF0R2pFgjTTfX7TxJx6+UZ Q/xfgWvYk503yvpF+87pWcyhZdhUZZii6ZxSvunC0IYYgqAIgBA9JDAyCEFtTiACmEbz p2dmlOG/5mEeXKW8dlHv56lszM8yq3nv6QtCHQ0j1FXvKCXgNG2Zl5nmF/C5Q2k7iujQ 0BqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533054; x=1690125054; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=IH93V5eIpnR2f14b4ZM+Ftn03PloFxrPP2IsbOx78io=; b=VRVZWsjd4Bxr/dGSfhATMZhNB18WCPk1Lf2Qvae8E6N2US631zlmeTLiPtjkO1fwnC cBsT4jEG2S9GGtzKe/gdMAEvVTKNjZDBme8MuKCPeUbQnCdJa+FEEiHzj0W0ojm4k++6 Ni3+0nHlXkxBxs17+oY9fZbbITguGLtSQ924apmyQj7HpXpdqFbozbitym0sCIFvgIvd TBBHRffEJD/rrYMxFkIg3x7IBJULyrHFIp1oUQjb/7EJDRodnH/kOkZnA7um8YmV6eYh Ob+ZfHyh2NocWekSsbwc7/xMmfa9Hfd7HQVEPxIzCSkmYp4osWaDjrHkNkn12luuO5Yh 458w== X-Gm-Message-State: AC+VfDwf9MqHUmkIs5iHGGua7CgN/2E2lwlTzcyocKy9tqUNjYPXnfc4 eSxnuZofoz64iVkoFbosIBhnuwtIHhjW X-Google-Smtp-Source: ACHHUZ71YtrP9eh5uIUH9igGT/qEm2g0LFTYHAxBkIZbIHRYVa/8joz40XdnhBaCuGE4JQq7/AsUsZwZRbeo X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a25:3444:0:b0:ba1:d0:7f7c with SMTP id b65-20020a253444000000b00ba100d07f7cmr3638441yba.2.1687533053943; Fri, 23 Jun 2023 08:10:53 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:12 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-9-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 08/12] perf vendor events intel: Update icelakex to 1.21 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/78d47cbbae48a0297a507ae4fea234ff37f= f9960 Adds the events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and DECODE.LCP. Descriptions are also updated. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- .../arch/x86/icelakex/frontend.json | 32 ++++++++++++++++--- .../arch/x86/icelakex/pipeline.json | 4 +-- .../x86/icelakex/uncore-interconnect.json | 2 +- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- 4 files changed, 32 insertions(+), 8 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/icelakex/frontend.json b/tools/= perf/pmu-events/arch/x86/icelakex/frontend.json index 71498044f1cb..f6edc4222f42 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/frontend.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transition= s count.", "CounterMask": "1", @@ -213,10 +221,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", "EventCode": "0x80", "EventName": "ICACHE_16B.IFDATA_STALL", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity.", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -237,10 +245,26 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.ST= ALLS]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IF= TAG_STALL]", "SampleAfterValue": "200003", "UMask": "0x4" }, diff --git a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json b/tools/= perf/pmu-events/arch/x86/icelakex/pipeline.json index 442a4c7539dd..176e5ef2a24a 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/pipeline.json @@ -318,10 +318,10 @@ "UMask": "0x40" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "500009", "UMask": "0x1" }, diff --git a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json index 8ac5907762e1..f87ea3f66d1b 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/uncore-interconnect.json @@ -9311,7 +9311,7 @@ "EventCode": "0x50", "EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS", "PerPkg": "1", - "PublicDescription": "Message Held : Parallel Success : ad and bl = messages were actually slotted into the same flit in paralle", + "PublicDescription": "Message Held : Parallel Success : ad and bl = messages were actually slotted into the same flit in parallel", "UMask": "0x8", "Unit": "M3UPI" }, diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index d63c9df8f65d..98828c3a9cde 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -14,7 +14,7 @@ GenuineIntel-6-A[DE],v1.01,graniterapids,core GenuineIntel-6-(3C|45|46),v33,haswell,core GenuineIntel-6-3F,v27,haswellx,core GenuineIntel-6-7[DE],v1.19,icelake,core -GenuineIntel-6-6[AC],v1.20,icelakex,core +GenuineIntel-6-6[AC],v1.21,icelakex,core GenuineIntel-6-3A,v24,ivybridge,core GenuineIntel-6-3E,v23,ivytown,core GenuineIntel-6-2D,v23,jaketown,core --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C62ECEB64DD for ; Fri, 23 Jun 2023 15:12:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232174AbjFWPM4 (ORCPT ); Fri, 23 Jun 2023 11:12:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232575AbjFWPLk (ORCPT ); Fri, 23 Jun 2023 11:11:40 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41F1030D7 for ; Fri, 23 Jun 2023 08:11:04 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5704995f964so10395497b3.2 for ; Fri, 23 Jun 2023 08:11:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533056; x=1690125056; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=850oXQZLts9z54NFDwePrz1Be1TwGAdCXoDLk79UUlM=; b=eFXGkyYsE+5BYTXO5WtCzZ12pcNaPQXqlZ9MHVVcEhA2C/FuiTax6doPjVxAOs1QuN BMTUMAB8tGl4oFuvKz9+ioQqxBy0v/Zj8R+xyjUbohNdZaP71xTLicQlHBExZdjMzLI1 eAR97XgD1+nQZAKZVDYlTZtXvIlIjeE26u6mEwTgRFJdP4xPHy4hz2uLKqd1sWMe/XJT GqOz5c3jjb7cuEgSnZ4l5qFWaAUBI4R1xKXyharuTkbz0LHwJ0VFM2OrCLMSu5fbJke8 nk2P06YhY+BepH7YU+ZWWIFfwy4l95cxGUYuwqTz8OgPNvTMiww4xPDMYA3XZ9JJzgqy 1u4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533056; x=1690125056; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=850oXQZLts9z54NFDwePrz1Be1TwGAdCXoDLk79UUlM=; b=GnhF4fAQQAi7Axo0ftOvreVpgl9Rv/ewIvLajiZkd5QKd1rbijB34a3rkxScA9qf6F 4/54M4ZxWntlmwehR9vtfjHDkWcw+eJhN4yDNgLdpyFHTqQiNQPE3OGOTR8IxYTUfsfU ypsnw/UjH8mdsk4RhYTbNAcoNSOKgnsqI+IAoSR9DzpQ9AOH1TcJpQ9KJM2KffPfrwXc 0Kok1eika/7OvirISzojSMu0kMWHByGmyatVVY2uC9yKAmXU3UmB8QOU6mH9OQ85ILyp 3bo0B8CJBH7VUWP7/SGNzONVSQHBQ1t91VXcHKxZHBagu0ZWhZIo7f8iu1Cjn6Ywz/1y I6eg== X-Gm-Message-State: AC+VfDy2bNLA+gNYjGMezLzeHdgxjLIbQxnP1IPh2NyCeZdElyG4a1Zk VqkrXuvK52yYZaKcOPrkQ6Z+7BKhDMJR X-Google-Smtp-Source: ACHHUZ6+15YCL3DkMoKwRHNc2KxumX7PsZPc1DWQyFZk5MdzIJtfQPpCJNQAhBnvX0Of/vVzJXP3xRGySgxm X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a81:af59:0:b0:573:3897:c925 with SMTP id x25-20020a81af59000000b005733897c925mr6372448ywj.6.1687533056765; Fri, 23 Jun 2023 08:10:56 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:13 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-10-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 09/12] perf vendor events intel: Update sapphirerapids to 1.14 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/a84850f1fec633002c35838ed34e51e1f0d= 6a2dd Adds a large number of CXL events like UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_DRD_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_RFO_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFRFO_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_RFO_PREF_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFDATA_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PREF_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFDATA_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFRFO_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_PREF_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_MISS_CXL_ACC, UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_MISS_CXL_ACC, UNC_CHA_TOR_OCCUPANCY.IA_HIT_CXL_ACC. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../arch/x86/sapphirerapids/pipeline.json | 2 +- .../arch/x86/sapphirerapids/uncore-cache.json | 308 ++++++++++++++++++ .../sapphirerapids/uncore-interconnect.json | 2 +- 4 files changed, 311 insertions(+), 3 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 98828c3a9cde..f321b2cd83da 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -24,7 +24,7 @@ GenuineIntel-6-1[AEF],v3,nehalemep,core GenuineIntel-6-2E,v3,nehalemex,core GenuineIntel-6-A7,v1.01,rocketlake,core GenuineIntel-6-2A,v19,sandybridge,core -GenuineIntel-6-(8F|CF),v1.13,sapphirerapids,core +GenuineIntel-6-(8F|CF),v1.14,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v56,skylake,core diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json b/= tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json index 72e9bdfa9f80..6dcf3b763af4 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/pipeline.json @@ -706,7 +706,7 @@ "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of specualtive operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by (any type of) branch mispredictions. This event es= timates number of speculative operations that were issued but not retired a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cache.jso= n b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cache.json index b91cebf81f50..3fa660694bc7 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cache.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-cache.json @@ -3156,6 +3156,23 @@ "UMask": "0xc88ffd01", "Unit": "CHA" }, + { + "BriefDescription": "All requests issued from IA cores to CXL acce= lerator memory regions that hit the LLC.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c0018101", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c0008101", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts; DRd hits from local IA", "EventCode": "0x35", @@ -3371,6 +3388,23 @@ "UMask": "0xc80f7e01", "Unit": "CHA" }, + { + "BriefDescription": "All requests issued from IA cores to CXL acce= lerator memory regions that miss the LLC.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c0018201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_CXL_ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c0008201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts for DRd misses from local IA", "EventCode": "0x35", @@ -3397,6 +3431,23 @@ "UMask": "0xc837fe01", "Unit": "CHA" }, + { + "BriefDescription": "DRds issued from an IA core which miss the L3= and target memory in a CXL type 2 memory expander card.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8178201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_CXL_ACC_LOCAL= ", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8168201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts for DRds issued by IA Cores targe= ting DDR Mem that Missed the LLC", "EventCode": "0x35", @@ -3442,6 +3493,15 @@ "UMask": "0xc827fe01", "Unit": "CHA" }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_OPT_CXL_ACC_L= OCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_OPT_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8268201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts; DRd Opt Pref misses from local I= A", "EventCode": "0x35", @@ -3451,6 +3511,15 @@ "UMask": "0xc8a7fe01", "Unit": "CHA" }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_OPT_PREF_CXL_= ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_OPT_PREF_CXL_ACC_LOC= AL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8a68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts for DRds issued by iA Cores targe= ting PMM Mem that Missed the LLC", "EventCode": "0x35", @@ -3469,6 +3538,23 @@ "UMask": "0xc897fe01", "Unit": "CHA" }, + { + "BriefDescription": "L2 data prefetches issued from an IA core whi= ch miss the L3 and target memory in a CXL type 2 accelerator.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8978201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_CXL_ACC_= LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_DRD_PREF_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8968201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts : DRd_Prefs issued by iA Cores ta= rgeting DDR Mem that Missed the LLC", "EventCode": "0x35", @@ -3603,6 +3689,23 @@ "UMask": "0xccd7fe01", "Unit": "CHA" }, + { + "BriefDescription": "LLC data prefetches issued from an IA core wh= ich miss the L3 and target memory in a CXL type 2 accelerator.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFDATA_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10ccd78201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFDATA_CXL_A= CC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFDATA_CXL_ACC_LOCA= L", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10ccd68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts; LLCPrefRFO misses from local IA", "EventCode": "0x35", @@ -3612,6 +3715,23 @@ "UMask": "0xccc7fe01", "Unit": "CHA" }, + { + "BriefDescription": "L2 RFO prefetches issued from an IA core whic= h miss the L3 and target memory in a CXL type 2 accelerator.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFRFO_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8878201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFRFO_CXL_AC= C_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_LLCPREFRFO_CXL_ACC_LOCAL= ", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8868201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts : WCiLFs issued by iA Cores targe= ting DDR that missed the LLC - HOMed locally", "EventCode": "0x35", @@ -3701,6 +3821,23 @@ "UMask": "0x10c8038201", "Unit": "CHA" }, + { + "BriefDescription": "RFOs issued from an IA core which miss the L3= and target memory in a CXL type 2 accelerator.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8078201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_CXL_ACC_LOCAL= ", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8068201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts RFO misses from local IA", "EventCode": "0x35", @@ -3719,6 +3856,23 @@ "UMask": "0xc887fe01", "Unit": "CHA" }, + { + "BriefDescription": "LLC RFO prefetches issued from an IA core whi= ch miss the L3 and target memory in a CXL type 2 accelerator.", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_PREF_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10ccc78201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_PREF_CXL_ACC_= LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_INSERTS.IA_MISS_RFO_PREF_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10ccc68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Inserts; RFO prefetch misses from local I= A", "EventCode": "0x35", @@ -4427,6 +4581,23 @@ "UMask": "0xc88ffd01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for All requests issued from IA= cores to CXL accelerator memory regions that hit the LLC.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_HIT_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c0018101", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_HIT_CXL_ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_HIT_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c0008101", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy; DRd hits from local IA", "EventCode": "0x36", @@ -4644,6 +4815,23 @@ "UMask": "0xc80f7e01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for All requests issued from IA= cores to CXL accelerator memory regions that miss the LLC.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c0018201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_CXL_ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c0008201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy for DRd misses from local IA", "EventCode": "0x36", @@ -4672,6 +4860,23 @@ "UMask": "0xc837fe01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for DRds and equivalent opcodes= issued from an IA core which miss the L3 and target memory in a CXL type 2= memory expander card.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8178201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_CXL_ACC_LOC= AL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8168201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy for DRds issued by iA Cores tar= geting DDR Mem that Missed the LLC", "EventCode": "0x36", @@ -4717,6 +4922,15 @@ "UMask": "0xc827fe01", "Unit": "CHA" }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT_CXL_ACC= _LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8268201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy; DRd Opt Pref misses from local= IA", "EventCode": "0x36", @@ -4726,6 +4940,15 @@ "UMask": "0xc8a7fe01", "Unit": "CHA" }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT_PREF_CX= L_ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT_PREF_CXL_ACC_L= OCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8a68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy for DRds issued by iA Cores tar= geting PMM Mem that Missed the LLC", "EventCode": "0x36", @@ -4744,6 +4967,23 @@ "UMask": "0xc897fe01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for L2 data prefetches issued f= rom an IA core which miss the L3 and target memory in a CXL type 2 accelera= tor.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PREF_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8978201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PREF_CXL_AC= C_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_PREF_CXL_ACC_LOCAL= ", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8968201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy : DRd_Prefs issued by iA Cores = targeting DDR Mem that Missed the LLC", "EventCode": "0x36", @@ -4878,6 +5118,23 @@ "UMask": "0xccd7fe01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for LLC data prefetches issued = from an IA core which miss the L3 and target memory in a CXL type 2 acceler= ator.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFDATA_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10ccd78201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFDATA_CXL= _ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFDATA_CXL_ACC_LO= CAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10ccd68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy; LLCPrefRFO misses from local I= A", "EventCode": "0x36", @@ -4887,6 +5144,23 @@ "UMask": "0xccc7fe01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for L2 RFO prefetches issued fr= om an IA core which miss the L3 and target memory in a CXL type 2 accelerat= or.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFRFO_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8878201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFRFO_CXL_= ACC_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_LLCPREFRFO_CXL_ACC_LOC= AL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8868201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy : WCiLFs issued by iA Cores tar= geting DDR that missed the LLC - HOMed locally", "EventCode": "0x36", @@ -4976,6 +5250,23 @@ "UMask": "0x10c8038201", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for RFOs issued from an IA core= which miss the L3 and target memory in a CXL type 2 accelerator.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10c8078201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_CXL_ACC_LOC= AL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_CXL_ACC_LOCAL", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10c8068201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy; RFO misses from local IA", "EventCode": "0x36", @@ -4994,6 +5285,23 @@ "UMask": "0xc887fe01", "Unit": "CHA" }, + { + "BriefDescription": "TOR Occupancy for LLC RFO prefetches issued f= rom an IA core which miss the L3 and target memory in a CXL type 2 accelera= tor.", + "EventCode": "0x36", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_PREF_CXL_ACC", + "PerPkg": "1", + "UMask": "0x10ccc78201", + "Unit": "CHA" + }, + { + "BriefDescription": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_PREF_CXL_AC= C_LOCAL", + "EventCode": "0x35", + "EventName": "UNC_CHA_TOR_OCCUPANCY.IA_MISS_RFO_PREF_CXL_ACC_LOCAL= ", + "PerPkg": "1", + "PortMask": "0x000", + "UMask": "0x10ccc68201", + "Unit": "CHA" + }, { "BriefDescription": "TOR Occupancy; RFO prefetch misses from local= IA", "EventCode": "0x36", diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-interconn= ect.json b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-interconnec= t.json index 6800de05c836..09d840c7da4c 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/uncore-interconnect.json @@ -3326,7 +3326,7 @@ "EventCode": "0x50", "EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS", "PerPkg": "1", - "PublicDescription": "Message Held : Parallel Success : ad and bl = messages were actually slotted into the same flit in paralle", + "PublicDescription": "Message Held : Parallel Success : ad and bl = messages were actually slotted into the same flit in parallel", "UMask": "0x8", "Unit": "M3UPI" }, --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5ABA2EB64DD for ; Fri, 23 Jun 2023 15:13:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232023AbjFWPNA (ORCPT ); Fri, 23 Jun 2023 11:13:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232585AbjFWPLl (ORCPT ); Fri, 23 Jun 2023 11:11:41 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E63E23592 for ; Fri, 23 Jun 2023 08:11:06 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-56942442eb0so10485227b3.1 for ; Fri, 23 Jun 2023 08:11:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533059; x=1690125059; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=jhkdSyMhMNyUWrs1ZBCpoFwWPpdU98HpFzjjfUuXr6w=; b=5k8KmnYkv37E2sQ+CDcAJEYxMggzjSfSxn/qF7Tdy0tuRh8Z/fYdUWeuymvDqQTAxh nwxkip8SOxicfWhcJOM6OFAK/qn4WfREzLPjliq8BZErwUW07s7oor6x5JaCDwJMAPto O1SDBiJK46hr7Kn6+LJIN6fb6M49SO3ARuGjEiw+WDYjfukKUbysqhEY+0Z8+PhzFtYM WqZDeDOfntcEXGflkp451NcvwvqJt4NgPoFiVPFPxzoU6MmH5s/lFK+RTiMrn3W0XGAJ XEvFCgJWYqxNoRr+LJA0xO7dUoiXf6rZQ03mEwiSUfiCmyePLxICfZa+JNgQhNKS+7M1 iDgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533059; x=1690125059; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=jhkdSyMhMNyUWrs1ZBCpoFwWPpdU98HpFzjjfUuXr6w=; b=A3Dt8iNJEyiVD8rhXKW0rtH6R0F8U9kzJhFz5KoOGm61gNFnAO+2BN9pmuN0tUbSNP ey66B008S4cVIRZUFyHPmDZWmW45aFcCLZM90QyWjdam6e4bIiyxliH24v12xmBhlwZo VDXLD/7ij1c9WHpOA0pL6v1W5dU3qtcb/Kn3p/neBHb0MBCgVHIm6wP8Q0JBaBMDZJpL 7AVS37q5ZASEboGw1zys4J3aYEJTR2CXnu4faZ5VZPpMgzQFPPtddN62CDQ6vRfkF6FJ B1LO6i6QTBD5l21wZ3ubAS9mocFTJGS8/pnQxC8RLCcuqj7jZUp3nwH2nqe7Cr/0/NL/ s63Q== X-Gm-Message-State: AC+VfDxCXjSbqD7jzkMpJN/Kl0LOwAmsqIVnzYOPB3yLMDHtslcNdBO7 8ZqynGgTN7DMMpVqVvWJjQAnr8Zn0Cpk X-Google-Smtp-Source: ACHHUZ7xs4izjdtsD2KbdhcXVHURYkBVoNoMVyXL+1fVhzFm6+yzeTXxYnifvTeXt9pBErB+ddtpuE4Jtett X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a81:af59:0:b0:573:3897:c925 with SMTP id x25-20020a81af59000000b005733897c925mr6372531ywj.6.1687533059320; Fri, 23 Jun 2023 08:10:59 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:14 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-11-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 10/12] perf vendor events intel: Update skylake to 57 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/1c3042c13bbfea05abe1ebb6910ae58b217= 2e9ef Adds the events IDQ.DSB_CYCLES_OK, IDQ.DSB_CYCLES_ANY, ICACHE_TAG.STALLS, DECODE.LCP, LSD.CYCLES_OK. Descriptions are also updated. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../pmu-events/arch/x86/skylake/frontend.json | 43 ++++++++++++++++--- .../pmu-events/arch/x86/skylake/pipeline.json | 17 ++++++-- 3 files changed, 52 insertions(+), 10 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index f321b2cd83da..5104b93d57ab 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -27,7 +27,7 @@ GenuineIntel-6-2A,v19,sandybridge,core GenuineIntel-6-(8F|CF),v1.14,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core -GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v56,skylake,core +GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v57,skylake,core GenuineIntel-6-55-[01234],v1.30,skylakex,core GenuineIntel-6-86,v1.21,snowridgex,core GenuineIntel-6-8[CD],v1.12,tigerlake,core diff --git a/tools/perf/pmu-events/arch/x86/skylake/frontend.json b/tools/p= erf/pmu-events/arch/x86/skylake/frontend.json index 04f08e4d2402..095904c77001 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/skylake/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches", "EventCode": "0xAB", @@ -245,27 +253,34 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", "SampleAfterValue": "200003", "UMask": "0x4" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]", "CounterMask": "4", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS", - "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.DSB_CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x18" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]", "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.DSB_CYCLES_ANY]", "SampleAfterValue": "2000003", "UMask": "0x18" }, @@ -296,6 +311,24 @@ "SampleAfterValue": "2000003", "UMask": "0x8" }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, { "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", "EventCode": "0x79", diff --git a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json b/tools/p= erf/pmu-events/arch/x86/skylake/pipeline.json index cc800fb8180a..cd3e737bf4a1 100644 --- a/tools/perf/pmu-events/arch/x86/skylake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/skylake/pipeline.json @@ -352,10 +352,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -479,11 +479,11 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder.", + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_OK]", "CounterMask": "4", "EventCode": "0xA8", "EventName": "LSD.CYCLES_4_UOPS", - "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector).", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -496,6 +496,15 @@ "SampleAfterValue": "2000003", "UMask": "0x1" }, + { + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0xA8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOP= S]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Number of Uops delivered by the LSD.", "EventCode": "0xA8", --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24E64C001B0 for ; Fri, 23 Jun 2023 15:13:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232491AbjFWPNK (ORCPT ); Fri, 23 Jun 2023 11:13:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232588AbjFWPLn (ORCPT ); Fri, 23 Jun 2023 11:11:43 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C2622721 for ; Fri, 23 Jun 2023 08:11:09 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bfee66a6398so1034868276.3 for ; Fri, 23 Jun 2023 08:11:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533062; x=1690125062; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=XWdLHEd/YivuvSTeVJWgp5BCZMhtIbKvRifpfShVHCI=; b=0HkCnco+eQv6EKoV2v0iNHoYUKIx/SUQ4lhEdy988EjtQ6B1J5yySKvHsTjkPe/rEp 9ngj0NxRBb4IVO4wz1jJeuTbCYIBiaQv0fsySUv+bmz0x1LlwtCI3dqKjejO8pGQPUA7 SeFwRfHxyyUF8W1w3AMrDFXYeJsXOc8Ruk43laslG67pnuVscg8qboNeGrRqhihBdGOq w2Kvd+/iR7otvpc8jugIgBvf6RlFEQFXZuaJAqMC7jPlz7HhuzcI3Lfwg99YaJ6LHfUn Jmn2lxXHOcCF7zlxKiVXr/i/jSM/FpGV40+XjgPjXL34ctGtWw6Jx3vhSQ0hdahxrMLp h2ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533062; x=1690125062; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=XWdLHEd/YivuvSTeVJWgp5BCZMhtIbKvRifpfShVHCI=; b=Ld0o6PC406R2CxkAOuzyI7EhPbPB/3MP/S5lW6FLdsCdH/212XB3FDe/58m86YS/ws KUavDTcUWvEYYlJSbpfWtMzEYCQyj2cxIGMHKhs2Jiqkq+tw+FjZbOvZIPZrPeWTfAwL zzqH8jk7wN8App857TjmYdKhb6Fv2vtSEAee04bajxrpsqPyfdcJaqQqaAR0FFGjY/8E fp1jsXcLbrzz6TIIYlStIq+zcAzFwwF1H0pFZ40Q3Lp+cQgZWva/cbWcCDUGVh6A690W nRpIwnps2/exES4Pf7HojEqH2xW40013Ho7txDHarUJ1hPHgwZSTd0X2HkSKktnudhVF OgqA== X-Gm-Message-State: AC+VfDzfASMq/G0QxUo+PHDoYZq4bGpfPTHWWtHtoiTh+8Cl8+FiB4X0 cZY8NyaZyW3yURPjLrztTO4pzMpCVIXw X-Google-Smtp-Source: ACHHUZ7VdNJP6YE93AASamdLXLN+VozOfQ1fRFJUJki+zsbAnONa7WGsMXJ2LXErAv35mrEF9YDLZTVi4S26 X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a25:aa4b:0:b0:bec:9efe:1027 with SMTP id s69-20020a25aa4b000000b00bec9efe1027mr8803907ybi.8.1687533061798; Fri, 23 Jun 2023 08:11:01 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:15 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-12-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 11/12] perf vendor events intel: Update skylakex to 1.31 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/cdaa69afe7a48a217b1d89320a27efc6e65= 0cec3 Adds the events IDQ.DSB_CYCLES_OK, IDQ.DSB_CYCLES_ANY, ICACHE_TAG.STALLS, DECODE.LCP, LSD.CYCLES_OK. Descriptions are also updated. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../arch/x86/skylakex/frontend.json | 43 ++++++++++++++++--- .../arch/x86/skylakex/pipeline.json | 17 ++++++-- .../x86/skylakex/uncore-interconnect.json | 2 +- .../arch/x86/skylakex/uncore-memory.json | 2 +- 5 files changed, 54 insertions(+), 12 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 5104b93d57ab..7c6598a9b240 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -28,7 +28,7 @@ GenuineIntel-6-(8F|CF),v1.14,sapphirerapids,core GenuineIntel-6-AF,v1.00,sierraforest,core GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v57,skylake,core -GenuineIntel-6-55-[01234],v1.30,skylakex,core +GenuineIntel-6-55-[01234],v1.31,skylakex,core GenuineIntel-6-86,v1.21,snowridgex,core GenuineIntel-6-8[CD],v1.12,tigerlake,core GenuineIntel-6-2C,v4,westmereep-dp,core diff --git a/tools/perf/pmu-events/arch/x86/skylakex/frontend.json b/tools/= perf/pmu-events/arch/x86/skylakex/frontend.json index 04f08e4d2402..095904c77001 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/frontend.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches", "EventCode": "0xAB", @@ -245,27 +253,34 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", "SampleAfterValue": "200003", "UMask": "0x4" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]", "CounterMask": "4", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS", - "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.DSB_CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x18" }, { - "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop", + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]", "CounterMask": "1", "EventCode": "0x79", "EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS", - "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ.", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.DSB_CYCLES_ANY]", "SampleAfterValue": "2000003", "UMask": "0x18" }, @@ -296,6 +311,24 @@ "SampleAfterValue": "2000003", "UMask": "0x8" }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "CounterMask": "1", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_ANY", + "PublicDescription": "Counts the number of cycles uops were delive= red to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) p= ath. Count includes uops that may 'bypass' the IDQ. [This event is alias to= IDQ.ALL_DSB_CYCLES_ANY_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, + { + "BriefDescription": "Cycles Decode Stream Buffer (DSB) is deliveri= ng 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0x79", + "EventName": "IDQ.DSB_CYCLES_OK", + "PublicDescription": "Counts the number of cycles 4 uops were deli= vered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB)= path. Count includes uops that may 'bypass' the IDQ. [This event is alias = to IDQ.ALL_DSB_CYCLES_4_UOPS]", + "SampleAfterValue": "2000003", + "UMask": "0x18" + }, { "BriefDescription": "Uops delivered to Instruction Decode Queue (I= DQ) from the Decode Stream Buffer (DSB) path", "EventCode": "0x79", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/= perf/pmu-events/arch/x86/skylakex/pipeline.json index 31a1663d57f8..66d686cc933e 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json @@ -361,10 +361,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -488,11 +488,11 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder.", + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_OK]", "CounterMask": "4", "EventCode": "0xA8", "EventName": "LSD.CYCLES_4_UOPS", - "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector).", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]", "SampleAfterValue": "2000003", "UMask": "0x1" }, @@ -505,6 +505,15 @@ "SampleAfterValue": "2000003", "UMask": "0x1" }, + { + "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn'= t come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]", + "CounterMask": "4", + "EventCode": "0xA8", + "EventName": "LSD.CYCLES_OK", + "PublicDescription": "Counts the cycles when 4 uops are delivered = by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOP= S]", + "SampleAfterValue": "2000003", + "UMask": "0x1" + }, { "BriefDescription": "Number of Uops delivered by the LSD.", "EventCode": "0xA8", diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.js= on b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json index 26a5a20bf37a..3eece8a728b5 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json @@ -6504,7 +6504,7 @@ "EventCode": "0x52", "EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS", "PerPkg": "1", - "PublicDescription": "ad and bl messages were actually slotted int= o the same flit in paralle", + "PublicDescription": "ad and bl messages were actually slotted int= o the same flit in parallel", "UMask": "0x8", "Unit": "M3UPI" }, diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/t= ools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json index 6f8ff2262ce7..7a40aa0f1018 100644 --- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json +++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json @@ -1952,7 +1952,7 @@ "EventCode": "0x81", "EventName": "UNC_M_WPQ_OCCUPANCY", "PerPkg": "1", - "PublicDescription": "Counts the number of entries in the Write Pe= nding Queue (WPQ) at each cycle. This can then be used to calculate both t= he average queue occupancy (in conjunction with the number of cycles not em= pty) and the average latency (in conjunction with the number of allocations= ). The WPQ is used to schedule writes out to the memory controller and to = track the requests. Requests allocate into the WPQ soon after they enter t= he memory controller, and need credits for an entry in this buffer before b= eing sent from the CHA to the iMC (memory controller). They deallocate aft= er being issued to DRAM. Write requests themselves are able to complete (f= rom the perspective of the rest of the system) as soon they have 'posted' t= o the iMC. This is not to be confused with actually performing the write t= o DRAM. Therefore, the average latency for this queue is actually not usef= ul for deconstruction intermediate write latencies. So, we provide filteri= ng based on if the request has posted or not. By using the 'not posted' fi= lter, we can track how long writes spent in the iMC before completions were= sent to the HA. The 'posted' filter, on the other hand, provides informat= ion about how much queueing is actually happening in the iMC for writes bef= ore they are actually issued to memory. High average occupancies will gene= rally coincide with high write major mode counts. Is there a filter of sort= s?", + "PublicDescription": "Counts the number of entries in the Write Pe= nding Queue (WPQ) at each cycle. This can then be used to calculate both t= he average queue occupancy (in conjunction with the number of cycles not em= pty) and the average latency (in conjunction with the number of allocations= ). The WPQ is used to schedule writes out to the memory controller and to = track the requests. Requests allocate into the WPQ soon after they enter t= he memory controller, and need credits for an entry in this buffer before b= eing sent from the CHA to the iMC (memory controller). They deallocate aft= er being issued to DRAM. Write requests themselves are able to complete (f= rom the perspective of the rest of the system) as soon they have 'posted' t= o the iMC. This is not to be confused with actually performing the write t= o DRAM. Therefore, the average latency for this queue is actually not usef= ul for deconstruction intermediate write latencies. So, we provide filteri= ng based on if the request has posted or not. By using the 'not posted' fi= lter, we can track how long writes spent in the iMC before completions were= sent to the HA. The 'posted' filter, on the other hand, provides informat= ion about how much queueing is actually happening in the iMC for writes bef= ore they are actually issued to memory. High average occupancies will gene= rally coincide with high write major mode counts.", "Unit": "iMC" }, { --=20 2.41.0.162.gfafddb0af9-goog From nobody Sat Feb 7 12:35:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C369AEB64DD for ; Fri, 23 Jun 2023 15:13:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231504AbjFWPNH (ORCPT ); Fri, 23 Jun 2023 11:13:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232288AbjFWPLn (ORCPT ); Fri, 23 Jun 2023 11:11:43 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FA432724 for ; Fri, 23 Jun 2023 08:11:10 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5704991ea05so10982667b3.1 for ; Fri, 23 Jun 2023 08:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687533064; x=1690125064; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=NjmwL8WseP3HzYigt6canmDc/9LLjtw4UlWBwZCWzrI=; b=5pRYaw4oEivyTLfaXO50KNsphBBt2HZ3hzpCDKDYewdSu9CQNRTV1OGheLK/gzyCVk K89jQmhYhcezy3Wa78KX9QJgrRvGSBDlcRH+Fs7hnTqxlU6ApNLCvpdWSmYC1dyHdLyz qv0DoNCjN9CE+WKKhEzMt8kRaUB4uj2vJlLhP9EtcLy9dcgR6jFfx1sgt882D5cWMs39 kKN2oLxYvB4bI3Wazd1b61l/JNmvxPz/cbF66xEPWXuS3GLe3rc/A2ysSTE9+7byyHf4 FgUlCsYTHEKfPenSvSPabGmnuVoCSVvMigrLox3phKB+oyTAcnjRKV/S0hHdNcUhhgKO I/Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687533064; x=1690125064; h=content-transfer-encoding:to:from:subject:references:mime-version :message-id:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=NjmwL8WseP3HzYigt6canmDc/9LLjtw4UlWBwZCWzrI=; b=ER2WSznqzp3JLndqi0PkxPnxsoKOp1YKfVacxIesIDuKOow18hnHXtXttX+CYad8Y0 8hfma5vyCTyQTCu+3yA0Ooq3TLBEs5jxgM13bdc3a6MxR6JDUwkvN5qTgQiaWvRM1ojW wtZKk71gcxGJVFCsTteQD7v3fZd5xcfsN/WrxckN8htJYz5iPhX55weueViW3tUT7oXm LghCxdJudSKm+a8PwqsKrV7RBV7jnAu6b0RpE3VTzeUKCa2fLPVmxJLlLIpUI7mFCYB5 70C9G/DvhGNv3pR1tp5W0EAepiLrcHdsb5ehh7zDBZHqjYaz9EIkh6i5PsZz9F1C6hTm bylg== X-Gm-Message-State: AC+VfDylgQPxwaHqrX0ncSzkO1zTbi0fqhdNmaFLCDdhmGHc8/hvqKFl nkeGAj5A/DE9SXF5+lwwBTMspmNc6BD6 X-Google-Smtp-Source: ACHHUZ5ZRQOjBM7R9fyw55HuF/+F5cACymM0q4IeYxLU+rS3Qeb0hZjzxc7H/lioBzxTNs3m1wbJC2A5RVlQ X-Received: from irogers.svl.corp.google.com ([2620:15c:2a3:200:6559:8968:cdfe:35b6]) (user=irogers job=sendgmr) by 2002:a81:b60a:0:b0:573:4c08:ec40 with SMTP id u10-20020a81b60a000000b005734c08ec40mr6253294ywh.7.1687533064442; Fri, 23 Jun 2023 08:11:04 -0700 (PDT) Date: Fri, 23 Jun 2023 08:10:16 -0700 In-Reply-To: <20230623151016.4193660-1-irogers@google.com> Message-Id: <20230623151016.4193660-13-irogers@google.com> Mime-Version: 1.0 References: <20230623151016.4193660-1-irogers@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Subject: [PATCH v2 12/12] perf vendor events intel: Update tigerlake to 1.13 From: Ian Rogers To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Kan Liang , Zhengjun Xing , Kajol Jain , John Garry , Andrii Nakryiko , Eduard Zingerman , Jing Zhang , Sohom Datta , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Perry Taylor , Samantha Alt , Caleb Biggers , Weilin Wang , Edward Baker Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Updates were released in: https://github.com/intel/perfmon/commit/9a3cd5ad68aee46078c663fe0cd9484e395= 6fd88 Adds the events ICACHE_DATA.STALLS, ICACHE_TAG.STALLS and DECODE.LCP. Descriptions are also updated. Signed-off-by: Ian Rogers Tested-by: Namhyung Kim --- tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +- .../arch/x86/tigerlake/frontend.json | 32 ++++++++++++++++--- .../arch/x86/tigerlake/pipeline.json | 6 ++-- 3 files changed, 32 insertions(+), 8 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-ev= ents/arch/x86/mapfile.csv index 7c6598a9b240..6650100830c4 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -30,7 +30,7 @@ GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v57,skylake,core GenuineIntel-6-55-[01234],v1.31,skylakex,core GenuineIntel-6-86,v1.21,snowridgex,core -GenuineIntel-6-8[CD],v1.12,tigerlake,core +GenuineIntel-6-8[CD],v1.13,tigerlake,core GenuineIntel-6-2C,v4,westmereep-dp,core GenuineIntel-6-25,v3,westmereep-sp,core GenuineIntel-6-2F,v3,westmereex,core diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json b/tools= /perf/pmu-events/arch/x86/tigerlake/frontend.json index 23b8528590b3..d7b972452c0e 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/frontend.json @@ -7,6 +7,14 @@ "SampleAfterValue": "100003", "UMask": "0x1" }, + { + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to ILD_STALL.LCP]", + "EventCode": "0x87", + "EventName": "DECODE.LCP", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to ILD_STALL.LCP]", + "SampleAfterValue": "500009", + "UMask": "0x1" + }, { "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transition= s count.", "CounterMask": "1", @@ -213,10 +221,10 @@ "UMask": "0x1" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", "EventCode": "0x80", "EventName": "ICACHE_16B.IFDATA_STALL", - "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity.", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", "SampleAfterValue": "500009", "UMask": "0x4" }, @@ -237,10 +245,26 @@ "UMask": "0x2" }, { - "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss.", + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", "EventCode": "0x83", "EventName": "ICACHE_64B.IFTAG_STALL", - "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss.", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.ST= ALLS]", + "SampleAfterValue": "200003", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "EventCode": "0x80", + "EventName": "ICACHE_DATA.STALLS", + "PublicDescription": "Counts cycles where a code line fetch is sta= lled due to an L1 instruction cache miss. The legacy decode pipeline works = at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", + "SampleAfterValue": "500009", + "UMask": "0x4" + }, + { + "BriefDescription": "Cycles where a code fetch is stalled due to L= 1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STAL= L]", + "EventCode": "0x83", + "EventName": "ICACHE_TAG.STALLS", + "PublicDescription": "Counts cycles where a code fetch is stalled = due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IF= TAG_STALL]", "SampleAfterValue": "200003", "UMask": "0x4" }, diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json b/tools= /perf/pmu-events/arch/x86/tigerlake/pipeline.json index 020801cbd7e3..541bf1dd1679 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json @@ -335,10 +335,10 @@ "UMask": "0x80" }, { - "BriefDescription": "Stalls caused by changing prefix length of th= e instruction.", + "BriefDescription": "Stalls caused by changing prefix length of th= e instruction. [This event is alias to DECODE.LCP]", "EventCode": "0x87", "EventName": "ILD_STALL.LCP", - "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk.", + "PublicDescription": "Counts cycles that the Instruction Length de= coder (ILD) stalls occurred due to dynamically changing prefix length of th= e decoded instruction (by operand size prefix instruction 0x66, address siz= e prefix instruction 0x67 or REX.W for Intel64). Count is proportional to t= he number of prefixes in a 16B-line. This may result in a three-cycle penal= ty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is= alias to DECODE.LCP]", "SampleAfterValue": "500009", "UMask": "0x1" }, @@ -564,7 +564,7 @@ "BriefDescription": "TMA slots wasted due to incorrect speculation= by branch mispredictions", "EventCode": "0xa4", "EventName": "TOPDOWN.BR_MISPREDICT_SLOTS", - "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the specualtive path a= s well as the out-of-order engine recovery past a branch misprediction.", + "PublicDescription": "Number of TMA slots that were wasted due to = incorrect speculation by branch mispredictions. This event estimates number= of operations that were issued but not retired from the speculative path a= s well as the out-of-order engine recovery past a branch misprediction.", "SampleAfterValue": "10000003", "UMask": "0x8" }, --=20 2.41.0.162.gfafddb0af9-goog