From nobody Sat Feb 7 21:30:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E340EB64D9 for ; Thu, 15 Jun 2023 13:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343775AbjFONzm (ORCPT ); Thu, 15 Jun 2023 09:55:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344624AbjFONyo (ORCPT ); Thu, 15 Jun 2023 09:54:44 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 336151FCC; Thu, 15 Jun 2023 06:54:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686837280; x=1718373280; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ibISdxCxH3dZZE/8tBcdYVkI1ZH7ec4GMtnaeXkhgZQ=; b=RLp743/0DgIfUjIJD3C9f4Lczeb24nGDTzMmLsPYg0ADOPpz9raPHYEE IJJpL+kiomu9YdIesQrxh0q5ETvjoPFk9eCIE6HXXaqXZfkzs7tmZJ6/B 01gyWTakkZC3u2hr0i+3i9PAf9JAGsOV+L4cOthzSAwZkXzl0agi/V0Ae mmrQGn4bkP+oY1lBGiKjsYUOYCMD6RhSMpqhoFxhVUDsNwq8f3KZDa5Ff 1bAu4IdC94dcwNNyJp+QTnZIPmSIp0+kcUDsv71kDWyoHAqHOL0jV9tDN 7IFUTOXtuuWNJKU0vEWmRboMSeMgkbCxz7nEzXSYR50EPbh+Una1+VJ17 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="356411393" X-IronPort-AV: E=Sophos;i="6.00,245,1681196400"; d="scan'208";a="356411393" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 06:54:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="782527050" X-IronPort-AV: E=Sophos;i="6.00,245,1681196400"; d="scan'208";a="782527050" Received: from kanliang-dev.jf.intel.com ([10.165.154.102]) by fmsmga004.fm.intel.com with ESMTP; 15 Jun 2023 06:54:38 -0700 From: kan.liang@linux.intel.com To: acme@kernel.org, mingo@redhat.com, peterz@infradead.org, irogers@google.com, namhyung@kernel.org, jolsa@kernel.org, adrian.hunter@intel.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Cc: ak@linux.intel.com, eranian@google.com, ahmad.yasin@intel.com, Kan Liang Subject: [PATCH V3 2/8] perf metric: JSON flag to default metric group Date: Thu, 15 Jun 2023 06:53:09 -0700 Message-Id: <20230615135315.3662428-3-kan.liang@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230615135315.3662428-1-kan.liang@linux.intel.com> References: <20230615135315.3662428-1-kan.liang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Kan Liang For the default output, the default metric group could vary on different platforms. For example, on SPR, the TopdownL1 and TopdownL2 metrics should be displayed in the default mode. On ICL, only the TopdownL1 should be displayed. Add a flag so we can tag the default metric group for different platforms rather than hack the perf code. The flag is added to Intel TopdownL1 since ICL and ADL, TopdownL2 metrics since SPR. Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. Reviewed-by: Ian Rogers Signed-off-by: Kan Liang --- .../arch/x86/alderlake/adl-metrics.json | 45 ++++++++------ .../arch/x86/alderlaken/adln-metrics.json | 25 ++++---- .../arch/x86/icelake/icl-metrics.json | 20 ++++--- .../arch/x86/icelakex/icx-metrics.json | 20 ++++--- .../arch/x86/sapphirerapids/spr-metrics.json | 60 +++++++++++-------- .../arch/x86/tigerlake/tgl-metrics.json | 20 ++++--- 6 files changed, 114 insertions(+), 76 deletions(-) diff --git a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json b/to= ols/perf/pmu-events/arch/x86/alderlake/adl-metrics.json index c9f7e3d4ab08..85fb975b6f56 100644 --- a/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json @@ -129,33 +129,36 @@ }, { "BriefDescription": "Counts the total number of issue slots that = were not consumed by the backend due to backend stalls", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. The rest of t= hese subevents count backend stalls, in cycles, due to an outstanding reque= st which is memory bound vs core bound. The subevents are not slot based = events and therefore can not be precisely added or subtracted from the Back= end_Bound_Aux subevents which are slot based.", "ScaleUnit": "100%", "Unit": "cpu_atom" }, { "BriefDescription": "Counts the total number of issue slots that = were not consumed by the backend due to backend stalls", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "tma_backend_bound", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound_aux", "MetricThreshold": "tma_backend_bound_aux > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that UOPS mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. All of these s= ubevents count backend stalls, in slots, due to a resource limitation. Th= ese are not cycle based events and therefore can not be precisely added or = subtracted from the Backend_Bound subevents which are cycle based. These s= ubevents are supplementary to Backend_Bound and can be used to analyze resu= lts from a resource perspective at allocation.", "ScaleUnit": "100%", "Unit": "cpu_atom" }, { "BriefDescription": "Counts the total number of issue slots that w= ere not consumed by the backend because allocation is stalled due to a misp= redicted jump or a machine clear", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "(tma_info_core_slots - (cpu_atom@TOPDOWN_FE_BOUND.A= LL@ + cpu_atom@TOPDOWN_BE_BOUND.ALL@ + cpu_atom@TOPDOWN_RETIRING.ALL@)) / t= ma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that = were not consumed by the backend because allocation is stalled due to a mis= predicted jump or a machine clear. Only issue slots wasted due to fast nuke= s such as memory ordering nukes are counted. Other nukes are not accounted = for. Counts all issue slots blocked during this recovery window including r= elevant microcode flows and while uops are not yet available in the instruc= tion queue (IQ). Also includes the issue slots that were consumed by the ba= ckend but were thrown away because they were younger than the mispredict or= machine clear.", "ScaleUnit": "100%", "Unit": "cpu_atom" @@ -295,11 +298,12 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to frontend stalls.", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -722,11 +726,12 @@ }, { "BriefDescription": "Counts the numer of issue slots that result = in retirement slots.", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.75", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "ScaleUnit": "100%", "Unit": "cpu_atom" }, @@ -832,22 +837,24 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-be\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -1112,11 +1119,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-fe\\-bound@ / (cpu_core@topdown\= \-fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retirin= g@ + cpu_core@topdown\\-be\\-bound@) - cpu_core@INT_MISC.UOP_DROPPING@ / tm= a_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%", "Unit": "cpu_core" @@ -2316,11 +2324,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "cpu_core@topdown\\-retiring@ / (cpu_core@topdown\\-= fe\\-bound@ + cpu_core@topdown\\-bad\\-spec@ + cpu_core@topdown\\-retiring@= + cpu_core@topdown\\-be\\-bound@) + 0 * tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%", "Unit": "cpu_core" diff --git a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json b/= tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json index ed9ff25a03cf..0f1628d698da 100644 --- a/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json +++ b/tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json @@ -94,31 +94,34 @@ }, { "BriefDescription": "Counts the total number of issue slots that = were not consumed by the backend due to backend stalls", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that uops mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. The rest of t= hese subevents count backend stalls, in cycles, due to an outstanding reque= st which is memory bound vs core bound. The subevents are not slot based = events and therefore can not be precisely added or subtracted from the Back= end_Bound_Aux subevents which are slot based.", "ScaleUnit": "100%" }, { "BriefDescription": "Counts the total number of issue slots that = were not consumed by the backend due to backend stalls", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "tma_backend_bound", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound_aux", "MetricThreshold": "tma_backend_bound_aux > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that= were not consumed by the backend due to backend stalls. Note that UOPS mu= st be available for consumption in order for this event to count. If a uop= is not available (IQ is empty), this event will not count. All of these s= ubevents count backend stalls, in slots, due to a resource limitation. Th= ese are not cycle based events and therefore can not be precisely added or = subtracted from the Backend_Bound subevents which are cycle based. These s= ubevents are supplementary to Backend_Bound and can be used to analyze resu= lts from a resource perspective at allocation.", "ScaleUnit": "100%" }, { "BriefDescription": "Counts the total number of issue slots that w= ere not consumed by the backend because allocation is stalled due to a misp= redicted jump or a machine clear", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "(tma_info_core_slots - (TOPDOWN_FE_BOUND.ALL + TOPD= OWN_BE_BOUND.ALL + TOPDOWN_RETIRING.ALL)) / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "Counts the total number of issue slots that = were not consumed by the backend because allocation is stalled due to a mis= predicted jump or a machine clear. Only issue slots wasted due to fast nuke= s such as memory ordering nukes are counted. Other nukes are not accounted = for. Counts all issue slots blocked during this recovery window including r= elevant microcode flows and while uops are not yet available in the instruc= tion queue (IQ). Also includes the issue slots that were consumed by the ba= ckend but were thrown away because they were younger than the mispredict or= machine clear.", "ScaleUnit": "100%" }, @@ -243,11 +246,12 @@ }, { "BriefDescription": "Counts the number of issue slots that were n= ot consumed by the backend due to frontend stalls.", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "ScaleUnit": "100%" }, { @@ -612,11 +616,12 @@ }, { "BriefDescription": "Counts the numer of issue slots that result = in retirement slots.", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots", - "MetricGroup": "TopdownL1;tma_L1_group", + "MetricGroup": "Default;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.75", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "ScaleUnit": "100%" }, { diff --git a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json b/tool= s/perf/pmu-events/arch/x86/icelake/icl-metrics.json index 20210742171d..cc4edf855064 100644 --- a/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelake/icl-metrics.json @@ -111,21 +111,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -372,11 +374,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1378,11 +1381,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json b/too= ls/perf/pmu-events/arch/x86/icelakex/icx-metrics.json index ef25cda019be..6f25b5b7aaf6 100644 --- a/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json +++ b/tools/perf/pmu-events/arch/x86/icelakex/icx-metrics.json @@ -315,21 +315,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -576,11 +578,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1674,11 +1677,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json= b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json index 4f3dd85540b6..c732982f70b5 100644 --- a/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json +++ b/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json @@ -340,31 +340,34 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_inf= o_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Branch Misprediction", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound += topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tm= a_info_thread_slots", - "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueBM", + "MetricGroup": "BadSpec;BrMispredicts;Default;TmaL2;TopdownL2;tma_= L2_group;tma_bad_speculation_group;tma_issueBM", "MetricName": "tma_branch_mispredicts", "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_specula= tion > 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Branch Misprediction. These slots are either wasted= by uops fetched from an incorrectly speculated program path; or stalls whe= n the out-of-order part of the machine needs to recover its state from a sp= eculative path. Sample with: TOPDOWN.BR_MISPREDICT_SLOTS. Related metrics: = tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredict= ions, tma_mispredicts_resteers", "ScaleUnit": "100%" }, @@ -407,11 +410,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e Core non-memory issues were of a bottleneck", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)", - "MetricGroup": "Backend;Compute;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", + "MetricGroup": "Backend;Compute;Default;TmaL2;TopdownL2;tma_L2_gro= up;tma_backend_bound_group", "MetricName": "tma_core_bound", "MetricThreshold": "tma_core_bound > 0.1 & tma_backend_bound > 0.2= ", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re Core non-memory issues were of a bottleneck. Shortage in hardware compu= te resources; or dependencies in software's instructions are both categoriz= ed under Core Bound. Hence it may indicate the machine ran out of an out-of= -order resource; certain execution units are overloaded or dependencies in = program's data- or instruction-flow are limiting the performance (e.g. FP-c= hained long-latency arithmetic operations).", "ScaleUnit": "100%" }, @@ -509,21 +513,23 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend bandwidth issues", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)", - "MetricGroup": "FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group;tma_issueFB", + "MetricGroup": "Default;FetchBW;Frontend;TmaL2;TopdownL2;tma_L2_gr= oup;tma_frontend_bound_group;tma_issueFB", "MetricName": "tma_fetch_bandwidth", "MetricThreshold": "tma_fetch_bandwidth > 0.1 & tma_frontend_bound= > 0.15 & tma_info_thread_ipc / 6 > 0.35", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend bandwidth issues. For example; inefficien= cies at the instruction decoders; or restrictions for caching in the DSB (d= ecoded uops cache) are categorized under Fetch Bandwidth. In such cases; th= e Frontend typically delivers suboptimal amount of uops to the Backend. Sam= ple with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LA= TENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_sw= itches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_= info_inst_mix_iptb, tma_lcp", "ScaleUnit": "100%" }, { "BriefDescription": "This metric represents fraction of slots the = CPU was stalled due to Frontend latency issues", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.U= OP_DROPPING / tma_info_thread_slots", - "MetricGroup": "Frontend;TmaL2;TopdownL2;tma_L2_group;tma_frontend= _bound_group", + "MetricGroup": "Default;Frontend;TmaL2;TopdownL2;tma_L2_group;tma_= frontend_bound_group", "MetricName": "tma_fetch_latency", "MetricThreshold": "tma_fetch_latency > 0.1 & tma_frontend_bound >= 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU was stalled due to Frontend latency issues. For example; instruction-= cache misses; iTLB misses or fetch stalls after a branch misprediction are = categorized under Frontend Latency. In such cases; the Frontend eventually = delivers no uops for some period. Sample with: FRONTEND_RETIRED.LATENCY_GE_= 16_PS;FRONTEND_RETIRED.LATENCY_GE_8_PS", "ScaleUnit": "100%" }, @@ -611,11 +617,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -630,11 +637,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring heavy-weight operations -- instructions that require= two or more uops or micro-coded sequences", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_heavy_operations", "MetricThreshold": "tma_heavy_operations > 0.1", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring heavy-weight operations -- instructions that requir= e two or more uops or micro-coded sequences. This highly-correlates with th= e uop length of these instructions/sequences. Sample with: UOPS_RETIRED.HEA= VY", "ScaleUnit": "100%" }, @@ -1486,11 +1494,12 @@ }, { "BriefDescription": "This metric represents fraction of slots wher= e the CPU was retiring light-weight operations -- instructions that require= no more than one uop (micro-operation)", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)", - "MetricGroup": "Retire;TmaL2;TopdownL2;tma_L2_group;tma_retiring_g= roup", + "MetricGroup": "Default;Retire;TmaL2;TopdownL2;tma_L2_group;tma_re= tiring_group", "MetricName": "tma_light_operations", "MetricThreshold": "tma_light_operations > 0.6", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots whe= re the CPU was retiring light-weight operations -- instructions that requir= e no more than one uop (micro-operation). This correlates with total number= of instructions used by the program. A uops-per-instruction (see UopPI met= ric) ratio of 1 or less should be expected for decently optimized software = running on Intel Core/Xeon products. While this often indicates efficient X= 86 instructions were executed; high value does not necessarily mean better = performance cannot be achieved. Sample with: INST_RETIRED.PREC_DIST", "ScaleUnit": "100%" }, @@ -1540,11 +1549,12 @@ }, { "BriefDescription": "This metric represents fraction of slots the = CPU has wasted due to Machine Clears", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts= )", - "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group= ;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", + "MetricGroup": "BadSpec;Default;MachineClears;TmaL2;TopdownL2;tma_= L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", "MetricName": "tma_machine_clears", "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation= > 0.15", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= CPU has wasted due to Machine Clears. These slots are either wasted by uo= ps fetched prior to the clear; or stalls the out-of-order portion of the ma= chine needs to recover its state after the clear. For example; this can hap= pen due to memory ordering Nukes (e.g. Memory Disambiguation) or Self-Modif= ying-Code (SMC) nukes. Sample with: MACHINE_CLEARS.COUNT. Related metrics: = tma_clears_resteers, tma_contested_accesses, tma_data_sharing, tma_false_sh= aring, tma_l1_bound, tma_microcode_sequencer, tma_ms_switches, tma_remote_c= ache", "ScaleUnit": "100%" }, @@ -1576,11 +1586,12 @@ }, { "BriefDescription": "This metric represents fraction of slots the = Memory subsystem within the Backend was a bottleneck", + "DefaultMetricgroupName": "TopdownL2", "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + top= down\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_in= fo_thread_slots", - "MetricGroup": "Backend;TmaL2;TopdownL2;tma_L2_group;tma_backend_b= ound_group", + "MetricGroup": "Backend;Default;TmaL2;TopdownL2;tma_L2_group;tma_b= ackend_bound_group", "MetricName": "tma_memory_bound", "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound > 0= .2", - "MetricgroupNoGroup": "TopdownL2", + "MetricgroupNoGroup": "TopdownL2;Default", "PublicDescription": "This metric represents fraction of slots the= Memory subsystem within the Backend was a bottleneck. Memory Bound estima= tes fraction of slots where pipeline is likely stalled due to demand load o= r store instructions. This accounts mainly for (1) non-completed in-flight = memory demand loads which coincides with execution units starvation; in add= ition to (2) cases where stores could impose backpressure on the pipeline w= hen many of them get buffered at the same time (less common out of the two)= .", "ScaleUnit": "100%" }, @@ -1784,11 +1795,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, diff --git a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json b/to= ols/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json index d0538a754288..83346911aa63 100644 --- a/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json +++ b/tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json @@ -105,21 +105,23 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere no uops are being delivered due to a lack of required resources for acc= epting new uops in the Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * cpu@INT= _MISC.RECOVERY_CYCLES\\,cmask\\=3D1\\,edge@ / tma_info_thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_backend_bound", "MetricThreshold": "tma_backend_bound > 0.2", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here no uops are being delivered due to a lack of required resources for ac= cepting new uops in the Backend. Backend is the portion of the processor co= re where the out-of-order scheduler dispatches ready uops into their respec= tive execution units; and once completed these uops get retired according t= o program order. For example; stalls due to data-cache misses or stalls due= to the divider unit being overloaded are both categorized under Backend Bo= und. Backend Bound is further divided into two main categories: Memory Boun= d and Core Bound. Sample with: TOPDOWN.BACKEND_BOUND_SLOTS", "ScaleUnit": "100%" }, { "BriefDescription": "This category represents fraction of slots wa= sted due to incorrect speculations", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + t= ma_retiring), 0)", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_bad_speculation", "MetricThreshold": "tma_bad_speculation > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= asted due to incorrect speculations. This include slots used to issue uops = that do not eventually get retired and slots for which the issue-pipeline w= as blocked due to recovery from earlier incorrect speculation. For example;= wasted work due to miss-predicted branches are categorized under Bad Specu= lation category. Incorrect data speculation followed by Memory Ordering Nuk= es is another example.", "ScaleUnit": "100%" }, @@ -366,11 +368,12 @@ }, { "BriefDescription": "This category represents fraction of slots wh= ere the processor's Frontend undersupplies its Backend", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topd= own\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UO= P_DROPPING / tma_info_thread_slots", - "MetricGroup": "PGO;TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_frontend_bound", "MetricThreshold": "tma_frontend_bound > 0.15", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots w= here the processor's Frontend undersupplies its Backend. Frontend denotes t= he first part of the processor core responsible to fetch operations that ar= e executed later on by the Backend part. Within the Frontend; a branch pred= ictor predicts the next address to fetch; cache-lines are fetched from the = memory subsystem; parsed into instructions; and lastly decoded into micro-o= perations (uops). Ideally the Frontend can issue Pipeline_Width uops every = cycle to the Backend. Frontend Bound denotes unutilized issue-slots when th= ere is no Backend stall; i.e. bubbles where Frontend delivered no uops whil= e Backend could have accepted them. For example; stalls due to instruction-= cache misses would be categorized under Frontend Bound. Sample with: FRONTE= ND_RETIRED.LATENCY_GE_4_PS", "ScaleUnit": "100%" }, @@ -1392,11 +1395,12 @@ }, { "BriefDescription": "This category represents fraction of slots ut= ilized by useful work i.e. issued uops that eventually get retired", + "DefaultMetricgroupName": "TopdownL1", "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdow= n\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_= thread_slots", - "MetricGroup": "TmaL1;TopdownL1;tma_L1_group", + "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", "MetricName": "tma_retiring", "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.= 1", - "MetricgroupNoGroup": "TopdownL1", + "MetricgroupNoGroup": "TopdownL1;Default", "PublicDescription": "This category represents fraction of slots u= tilized by useful work i.e. issued uops that eventually get retired. Ideall= y; all pipeline slots would be attributed to the Retiring category. Retiri= ng of 100% would indicate the maximum Pipeline_Width throughput was achieve= d. Maximizing Retiring typically increases the Instructions-per-cycle (see= IPC metric). Note that a high Retiring value does not necessary mean there= is no room for more performance. For example; Heavy-operations or Microco= de Assists are categorized under Retiring. They often indicate suboptimal p= erformance and can often be optimized or avoided. Sample with: UOPS_RETIRED= .SLOTS", "ScaleUnit": "100%" }, --=20 2.35.1